[UVA][Trie] 760 - DNA Sequencing
DNA Sequencing
DNA Sequencing |
A DNA molecule consists of two strands that wrap around each other to resemble a twisted ladder whose sides, made of sugar and phosphate molecules, are connected by rungs of nitrogen-containing chemicals called bases. Each strand is a linear arrangement of repeating similar units called nucleotides, which are each composed of one sugar, one phosphate, and a nitrogenous base. Four different bases are present in DNA: adenine (A), thymine (T), cytosine (C), and guanine (G). The particular order of the bases arranged along the sugar-phosphate backbone is called the DNA sequence; the sequence specifies the exact genetic instructions required to create a particular organism with its own unique traits.
Geneticists often compare DNA strands and are interested in finding the
longest common
base sequence in the two strands. Note that these strands can be represented
as strings
consisting of the letters a, t, c and g. So, the
longest common sequence in the two
strands atgc and tga is tg. It is entirely possible that
two different common sequences
exist that are the same length and are the longest possible common
sequences. For
example in the strands atgc and gctg, the longest common
sequences are gc
and tg.
Input and Output
Write a program that accepts as input two strings representing DNA strands, and prints as output the longest common sequence(s) in lexicographical order.If there isn't any common sequence between the two strings, just print: ``No common sequence."
If there are more than one test cases, it must be a blank line between two consecutive, both in input and output files.
The strings are at most 300 characters-long.
Sample Input
atgc tga atgc gctg
Sample Output
tg gc tg
原本這題要用 SA 去做的,但數據很小,length < 300。
用 O(n*n) 建樹,一樣用 O(n*n) 去 match 這棵樹。
然後輸出即可。
這題很妙的是,我一開始就 RE 了,但是 UVa 上總是 WA,讓我一直抓不到 BUG。
#include <stdio.h>
#include <string.h>
struct TrieNd{
int LK[26];
int ac;
} ND[100000];
int TrieSize = 0;
void insTrie(char s[]) {
static int i, idx, v;
for(i = 0, idx = 0; s[i]; i++) {
v = s[i]-'a';
if(!ND[idx].LK[v]) {
TrieSize++;
memset(&ND[TrieSize], 0, sizeof(ND[0]));
ND[idx].LK[v] = TrieSize;
}
idx = ND[idx].LK[v];
}
}
void matchTrie(char s[]) {
static int i, idx, v;
for(i = 0, idx = 0; s[i]; i++) {
v = s[i]-'a';
ND[idx].ac = 1;
if(ND[idx].LK[v])
idx = ND[idx].LK[v];
else
break;
}
ND[idx].ac = 1;
}
int mxlen = -1;
void dfs(int nd, int dep) {
int i;
for(i = 0; i < 26; i++) {
if(ND[nd].LK[i] && ND[ND[nd].LK[i]].ac) {
if(dep+1 > mxlen)
mxlen = dep+1;
dfs(ND[nd].LK[i], dep+1);
}
}
}
char path[512];
void dfs2(int nd, int dep) {
int i;
for(i = 0; i < 26; i++) {
if(ND[nd].LK[i] && ND[ND[nd].LK[i]].ac) {
path[dep] = i+'a';
if(dep+1 == mxlen) {
path[dep+1] = '\0';
puts(path);
}
dfs2(ND[nd].LK[i], dep+1);
}
}
}
int main() {
char s1[512], s2[512];
int i, first = 0;
while(gets(s1)) {
gets(s2);
if(first) puts("");
first = 1;
memset(&ND[0], 0, sizeof(ND[0]));
TrieSize = 0;
for(i = 0; s1[i]; i++)
insTrie(s1+i);
for(i = 0; s2[i]; i++)
matchTrie(s2+i);
mxlen = -1;
dfs(0, 0);
if(mxlen > 0)
dfs2(0, 0);
else
puts("No common sequence.");
gets(s1);
}
return 0;
}