[UVA][二分答案] 11860 - Document Analyzer
B |
Document Analyzer Input: Standard Input Output: Standard Output |
You work in a leading software development company. As you are great in coding, most of the critical tasks are allotted for you. You like the challenge and you feel excited to solve those problems.
Recently your company is developing a project named Document Analyzer. In this project you are assigned a task; of course a critical task. The task is that you are given a document consisting of lowercase letters, numbers and punctuations. You have to analyze the document and separate the words first. Words are consecutive sequences of lower case letters. After listing the words, in the order same as they occurred in the document, you have to number them from 1, 2, ... n. After that you have to find the range p and q (p ≤ q) such that all kinds of words occur between p and q (inclusive). If there are multiple such solutions you have to find the one where the difference of p and q is smallest. If still there is a tie, then find the solution where p is smallest.
Since you do have other works to do, you have to solve this task within 5 hours.
Input
First line of input will contain T (≤ 20) denoting number of documents.
Each document will be denoted by one or more lines, each line having no more than 150 characters. A document will contain either lowercase letters or numbers or punctuations. The last line of a document will contain the word ‘END’ which is of course not the part of the document. You can assume that a document will contain between 1 and 105 words (inclusive).
Output
For each document, print the document number first. After that, print p and q as described above. For exact formatting, see the samples.
Sample Input Output for Sample Input
3 1. a case is a case, 2. case is not a case~ END a b c d e END a@#$a^%a a a b b----b b++12b END |
Document 1: 6 9 Document 2: 1 5 Document 3: 5 6 |
The input file size is about 9 MB, so use faster input techniques.
Problemsetter: Jane Alam Jan, Special Thanks:
Abdullah-al-Mahmud
題目要求找一個區間,這個區間出現過整個文本的所有單字。
其中區間長度越小越好,同時左端點越小越好。
二分長度 O(logn),然後 O(n) 檢查。
效率 O(nlogn)
#include <stdio.h>
#include <iostream>
#include <map>
#include <string.h>
using namespace std;
map<string, int> WORD;
int list[100000];
int mark[100000];
void WORDclear(int i, int L) {
while(i >= 0 && L >= 0)
mark[list[i]] = 0, i--, L--;
}
int check(int L, int n, int tot) {
int i, cnt = tot;
map<string, int>::iterator it;
for(i = 0; i < L; i++) {
if(mark[list[i]] == 0)
cnt--;
mark[list[i]]++;
}
if(cnt == 0) {
WORDclear(L-1, L);
return 0;
}
for(i = L; i < n; i++) {
if(mark[list[i]] == 0)
cnt--;
mark[list[i]]++;
if(mark[list[i-L]] == 1)
cnt++;
mark[list[i-L]]--;
if(cnt == 0) {
WORDclear(i, L);
return i-L+1;
}
}
WORDclear(n-1, L);
return -1;
}
int main() {
int t, i, j, k, cases = 0;
char line[1024], buf[1024];
scanf("%d", &t);
gets(line);
while(t--) {
WORD.clear();
int n = 0, label = 1, x;
map<string, int>::iterator it;
while(gets(line) && strcmp("END", line)) {
for(i = 0; line[i]; i++) {
if(line[i] >= 'a' && line[i] <= 'z') {
int idx = 0;
while(line[i] && line[i] >= 'a' && line[i] <= 'z')
buf[idx++] = line[i++];
buf[idx] = '\0', i--;
it = WORD.find(buf);
if(it == WORD.end()) {
x = label;
WORD[buf] = label++;
} else
x = it->second;
list[n++] = x;
}
}
}
int L = 0, R = n, m;
int p, q, tp;
while(L <= R) { // length of [p, q]
m = (L+R)/2;
tp = check(m, n, label-1);
if(tp >= 0)
R = m-1, p = tp, q = p+m-1;
else
L = m+1;
}
printf("Document %d: %d %d\n", ++cases, p+1, q+1);
}
return 0;
}