[UVA][dp] 11022 - String Factoring
Problem B
String Factoring
Input: Standard Input
Output: Standard Output
Spotting patterns in seemingly random strings is a problem with many applications. E.g. in our efforts to understand the genome we investigate the structure of DNA strings. In data compression we are interested in finding repetitions, so the data can be represented more efficiently with pointers. Another plausible example arises from the area of artificial intelligence, as how to interpret information given to you in a language you do not know. The natural thing to do in order to decode the information message would be to look for recurrences in it. So if the SETI project (the Search for Extra Terrestrial Intelligence) ever get a signal in the H21-spectra, we need to know how to decompose it.
One way of capturing the redundancy of a string is to _nd its factoring. If two or more identical substrings A follow each other in a string S, we can represent this part of S as the substring A, embraced by parentheses, raised to the power of the number of its recurrences. E.g. the string DOODOO can be factored as (DOO)2, but also as (D(O)2)2. Naturally, the latter factoring is considered better since it cannot be factored any further. We say that a factoring is irreducible if it does not contain any consecutive repetition of a substring. A string may have several irreducible factorings, as seen by the example string POPPOP. It can be factored as (POP)2, as well as PO(P)2OP. The first factoring has a shorter representation and motivates the following definition. The weight of a factoring, equals the number of characters in it, excluding the parentheses and the exponents. Thus the weight of (POP)2 is 3, whereas PO(P)2OP has weight 5. A maximal factoring is a factoring with the smallest possible weight. It should be clear that a maximal factoring is always an irreducible one, but there may still be several maximal factorings. E.g. the string ABABA has two maximal factorings (AB)2A and A(BA)2.
Input
The input consists of several rows. The rows each hold one string of at least one, but less than 80 characters from the capital alphabet A-Z. The input is terminated by a row containing the character '*' only. There will be no white space characters in the input.
Output
For each string in the input, output one line containing the weight of a maximal factoring of the string.
Sample Input Output for Sample Input
PRATTATTATTIC GGGGGGGGG PRIME BABBABABBABBA ARPARPARPARPAR *
|
6 1 5 6 5
|
Swedish National Contest
說是 dynamic programming
但是在計算上網路公布是 O(n^3), 我這裡得實作接近 O(n^4)。
首先考慮串接的方式 dp[i][j] 表示字串 s[i...j] 的最小使用量。
那麼可以得到 dp[i][j] = min(dp[i][k]+dp[k+1][j]) (i <= k < j)
另一種可能,縮成次方
dp[i][j] = min(dp[i][k]), if s[i...j] == s[i...k]^((j-i+1)/(k-i+1))
這題基本上數據很小,只考驗正確性,就沒那麼在意了。
#include <string.h>
#include <algorithm>
using namespace std;
int dp[105][105];
char s[105];
int dfs(int l, int r) {
if(l == r)
return 1;
if(dp[l][r])
return dp[l][r];
int i, j, k;
int &ret = dp[l][r];
ret = 0xfffffff;
for(i = l; i < r; i++)
ret = min(ret, dfs(l, i) + dfs(i+1, r));
int sublen = r-l+1;
for(i = 1; i <= sublen; i++) {
if(sublen%i == 0) {
for(k = l, j = 0; k <= r; k++) {
if(s[k] != s[j+l])
break;
j++;
if(j >= i) j = 0;
}
if(k == r+1 && r != l+i-1)
ret = min(ret, dfs(l, l+i-1));
}
}
return ret;
}
int main() {
while(scanf("%s", s) == 1) {
if(!strcmp(s, "*"))
break;
memset(dp, 0, sizeof(dp));
int len = strlen(s);
printf("%d\n", dfs(0, len-1));
}
return 0;
}