2013-04-08 19:37:00Morris

[UVA][AC自動機] 1254 - Top 10

We all use search everyday; to find a file in a directory, to find an email in the inbox, to find a song in
a playlist. Search is more than just a linear scan through a list of texts in a dictionary; It's binary
search, it's indexing, it's using your full text search algorithms to solve one of the hardest problems
we know.
-- adapted from NUMB3RS

Given a dictionary containing less than N = 20000 words labeled from 1 to N. Each word consists of lowercase characters (from 'a' to 'z') with arbitrary length. The total number of characters in the dictionary is at most 100,000. Your task is to answer at most Q = 100000 queries. Each query qi is also a word (as defined above). For each query, you have to print the "Top 10" words in the dictionary with the following rules:

  • All the words in the "Top 10" have to contain the substring qi.
  • All the words in the "Top 10" have to be sorted in this order:
    1. The words with shorter length come first, if they have equal length then
    2. The lexicographically smaller words come first, otherwise
    3. The words with smaller label come first.
  • If the number of words in the dictionary that contains the substring qi is less than 10 then print all the words otherwise, print only the top-10 words (note: the words are printed using their labels).
  • If there is no word in the dictionary that contains the substring qi then print "-1" (without the quotes).

Input 

The first line contains the number N. The next N lines contains the N words in the dictionary (the ith line is the word with label i). The next line contains the number Q followed by the Q lines containing the queries.

Output 

For each query, print one line containing the labels of the "Top 10" words (separated by a space) in the dictionary using the rules defined above.

Sample Input 

17
acm
icpc
regional
asia
jakarta
two
thousand
and
nine
arranged
by
universitas
bina
nusantara
especially
for
you
5
a
an
win
b
z

Sample Output 

1 8 4 13 5 10 3 7 14 15
8 10 7 14
-1
11 13
-1


將 word 與 query 接建成 Trie, 因此有兩棵不同的 Trie,
然後將 query 的 Trie 建立 AC 自動機, 因此將單字依序在 AC 自動機裡面跑就可以了。

但題目只要求前 Top 10, 之所以將 word 建立成 Trie, 就是在 BFS 搜索的時候,
會將先長度短的先輸出, 相同時字典順序小的, 接著是編號小的。

依照 BFS 的順序跑一次 AC自動機。

剪枝的地方很明顯可以想到的是,在匹配的過程中, 失敗指針已經被批配過了,
那就不用繼續匹配失敗指針, 同時如果那筆 query 已經存在 Top 10 了,
也不用繼續匹配失敗指針的。
// 0.120 s Rank3
#include <stdio.h>
#include <string.h>
#include <vector>
#include <queue>
#include <iostream>
using namespace std;
string word[20000];
vector<int> Top10[100000];
struct Node {
Node *next[26], *fail;
vector<int> label;
int matched;
Node() {
fail = NULL;
matched = -1;
label.clear();
memset(next, 0, sizeof(next));
}
};
void insertTrie(const char str[], Node *root, int label) {
static Node *p;
static int i, idx;
p = root;
for(i = 0; str[i]; i++) {
idx = str[i]-'a';
if(p->next[idx] == NULL) {
p->next[idx] = new Node();
}
p = p->next[idx];
}
p->label.push_back(label);
}
void freeTrie(Node *root) {
queue<Node*> Q;
Node *nd;
while(!Q.empty()) {
nd = Q.front(), Q.pop();
for(int i = 0; i < 26; i++)
if(nd->next[i] != NULL)
Q.push(nd->next[i]);
nd->label.clear();
delete nd;
}
}
void buildACautomation(Node *root) {
queue<Node*> Q;
Node *nd, *p;
root->fail = NULL;
Q.push(root);
while(!Q.empty()) {
nd = Q.front(), Q.pop();
for(int i = 0; i < 26; i++) {
if(nd->next[i] == NULL)
continue;
Q.push(nd->next[i]);
p = nd->fail;
while(p != NULL && p->next[i] == NULL)
p = p->fail;
if(p == NULL)
nd->next[i]->fail = root;
else
nd->next[i]->fail = p->next[i];
}
}
}
void quertACautomaiton(const char* str, Node *root, int number) {
static Node *p, *q;
static int i, idx;
p = root;
for(i = 0; str[i]; i++) {
idx = str[i]-'a';
while(p != root && p->next[idx] == NULL)
p = p->fail;
p = p->next[idx];
p = (p == NULL) ? root : p;
q = p;
while(q != root && q->matched != number) {
int bflag = 0;
for(vector<int>::iterator it = q->label.begin();
it != q->label.end(); it++) {
if(Top10[*it].size() == 10) {
bflag = 1;
break;
}
Top10[*it].push_back(number);
}
if(bflag) break;
q->matched = number;
q = q->fail;
}
}
}
void sortTrie(Node *root, Node *qroot) {
// 1.shorter length 2. lexicographically 3. label small
Node *tn;
queue<Node*> Q;
Q.push(root);
while(!Q.empty()) {
tn = Q.front(), Q.pop();
for(int i = 0; i < 26; i++) {
if(tn->next[i] == NULL)
continue;
Q.push(tn->next[i]);
}
for(vector<int>::iterator it = tn->label.begin();
it != tn->label.end(); it++) {
quertACautomaiton(word[*it].c_str(), qroot, *it);
//cout << word[*it] << endl;
}
}
}
int main() {
int n, m;
int i, j;
char s[100005];
while(scanf("%d", &n) == 1) {
while(getchar() != '\n');
Node *root = new Node();
for(i = 0; i < n; i++) {
gets(s);
word[i] = s;
insertTrie(s, root, i);
}
scanf("%d", &m);
while(getchar() != '\n');
Node *qroot = new Node();
for(i = 0; i < m; i++) {
gets(s);
insertTrie(s, qroot, i);
}
buildACautomation(qroot);
sortTrie(root, qroot);
for(i = 0; i < m; i++) {
if(Top10[i].size() == 0)
puts("-1");
else {
printf("%d", Top10[i][0]+1);
for(j = 1; j < Top10[i].size(); j++)
printf(" %d", Top10[i][j]+1);
puts("");
}
Top10[i].clear();
}
freeTrie(root);
freeTrie(qroot);
}
return 0;
}