Morris' Blog

2013-03-17 22:50:23Morris

[UVA][二分答案] 11860 - Document Analyzer

留言 0 收藏 0 推薦 0

Document Analyzer

Input: Standard Input

Output: Standard Output

You work in a leading software development company. As you are great in coding, most of the critical tasks are allotted for you. You like the challenge and you feel excited to solve those problems.

Recently your company is developing a project named Document Analyzer. In this project you are assigned a task; of course a critical task. The task is that you are given a document consisting of lowercase letters, numbers and punctuations. You have to analyze the document and separate the words first. Words are consecutive sequences of lower case letters. After listing the words, in the order same as they occurred in the document, you have to number them from 1, 2, ... n. After that you have to find the range p and q (p ≤ q) such that all kinds of words occur between p and q (inclusive). If there are multiple such solutions you have to find the one where the difference of p and q is smallest. If still there is a tie, then find the solution where p is smallest.

Since you do have other works to do, you have to solve this task within 5 hours.

Input

First line of input will contain T (≤ 20) denoting number of documents.

Each document will be denoted by one or more lines, each line having no more than 150 characters. A document will contain either lowercase letters or numbers or punctuations. The last line of a document will contain the word ‘END’ which is of course not the part of the document. You can assume that a document will contain between 1 and 10⁵ words (inclusive).

Output

For each document, print the document number first. After that, print p and q as described above. For exact formatting, see the samples.

Sample Input Output for Sample Input

1. a case is a case,

2. case is not a case~

END

a b c d e

END

a@#$a^%a a a

b b----b b++12b

END

Document 1: 6 9

Document 2: 1 5

Document 3: 5 6

The input file size is about 9 MB, so use faster input techniques.

Problemsetter: Jane Alam Jan, Special Thanks: Abdullah-al-Mahmud

題目要求找一個區間，這個區間出現過整個文本的所有單字。

其中區間長度越小越好，同時左端點越小越好。

二分長度 O(logn)，然後 O(n) 檢查。

效率 O(nlogn)

#include <stdio.h>
#include <iostream>
#include <map>
#include <string.h>
using namespace std;
map<string, int> WORD;
int list[100000];
int mark[100000];
void WORDclear(int i, int L) {
    while(i >= 0 && L >= 0)
        mark[list[i]] = 0, i--, L--;
}
int check(int L, int n, int tot) {
    int i, cnt = tot;
    map<string, int>::iterator it;
    for(i = 0; i < L; i++) {
        if(mark[list[i]] == 0)
            cnt--;
        mark[list[i]]++;
    }
    if(cnt == 0) {
        WORDclear(L-1, L);
        return 0;
    }
    for(i = L; i < n; i++) {
        if(mark[list[i]] == 0)
            cnt--;
        mark[list[i]]++;
        if(mark[list[i-L]] == 1)
            cnt++;
        mark[list[i-L]]--;
        if(cnt == 0) {
            WORDclear(i, L);
            return i-L+1;
        }
    }
    WORDclear(n-1, L);
    return -1;
}
int main() {
    int t, i, j, k, cases = 0;
    char line[1024], buf[1024];
    scanf("%d", &t);
    gets(line);
    while(t--) {
        WORD.clear();
        int n = 0, label = 1, x;
        map<string, int>::iterator it;
        while(gets(line) && strcmp("END", line)) {
            for(i = 0; line[i]; i++) {
                if(line[i] >= 'a' && line[i] <= 'z') {
                    int idx = 0;
                    while(line[i] && line[i] >= 'a' && line[i] <= 'z')
                        buf[idx++] = line[i++];
                    buf[idx] = '\0', i--;
                    it = WORD.find(buf);
                    if(it == WORD.end()) {
                        x = label;
                        WORD[buf] = label++;
                    } else
                        x = it->second;
                    list[n++] = x;
                }
            }
        }
        int L = 0, R = n, m;
        int p, q, tp;
        while(L <= R) { // length of [p, q]
            m = (L+R)/2;
            tp = check(m, n, label-1);
            if(tp >= 0)
                R = m-1, p = tp, q = p+m-1;
            else
                L = m+1;
        }
        printf("Document %d: %d %d\n", ++cases, p+1, q+1);
    }
    return 0;
}

上一篇：[UVA][射線法、線段交] 10348 - Submarines

下一篇：[UVA][Queue] 10901 - Ferry Loading III

Morris' Blog

[UVA][二分答案] 11860 - Document Analyzer

Input

Output

Sample Input Output for Sample Input

你可能感興趣的文章

大難不死

小記事

夜班公車

憐憫？

PRAY