[UVA] 892 - Finding words
Finding words
Finding words |
A common problem when processing incoming text is to isolate the words in the text. This is made more difficult by the punctuation; words have commas, ``quote marks", (even brackets) next to them, or hy-phens in the middle of the word. This punctuation doesn't count as letters when the words have to be looked up in a dictionary by the program.
For this problem, you must separate out ``clean" words from text, that is, words with no attached or embedded
non-letters. A ``word" is any continuous string of non-whitespace characters, with whitespace characters on
each side of it. For this problem, a ``whitespace" character is a space character or an end-of-line character,
or the start or end of the file (so that, for example, if the input file consists of `Anne Bob', where there is a
space character between the A and B but no other, then there are two words, `Anne' and `Bob').
Input
Input will consist of lines with no more than 60 characters in each line. Every line will be terminated by a character which isn't whitespace (which will be followed immediately by an end-of-line character). The input will be terminated by a line consisting of a single `#'.
Output
Output must be the lines of the incoming text, with the non-letters stripped away from each word. A non-letter is any character which is not a letter (a - z and A - Z) and not a whitespace character. Your program must not change the letters and space characters. When a non-letter occurs in the middle of a word (ie there is no whitespace character next to it), it must be simply removed (see what happens to the word `doesn't' in the example). A word which consists entirely of non-letters will therefore be removed entirely.
There is a special rule for a hyphen (`-') when it is the very last character in a line:
- the word part before the hyphen and the first word part on the next line form a single word;
- this complete word must be written on a line by itself;
- you can assume that there will always be a space before the word part on the first line, and a space after the word part on the second line. These 2 spaces must appear in the output.
Sample Input
A common problem when processing incoming text is to isolate the words in the text. This is made more difficult by the punctuation; words have commas, "quote marks", (even brackets) next to them, or hy- phens in the middle of the word. This punctuation doesn't count as letters when the words have to be looked up in a # dictionary by the 12345 "**&! program. #
Sample Output
A common problem when processing incoming text is to isolate the words in the text This is made more difficult by the punctuation words have commas quote marks even brackets next to them or hyphens in the middle of the word This punctuation doesnt count as letters when the words have to be looked up in a dictionary by the program
最近都在嘗試新的語法, 狀況似乎不錯
#include <iostream>
#include <ctype.h>
using namespace std;
int main() {
string line, tmp = "";
int next = 0;
while(getline(cin, line)) {
if(line == "#")
break;
int i, len = line.length();
for(i = 0; i < len; i++) {
if(isspace(line[i])) {
cout << tmp;
if(next)
cout << endl;
cout << line[i];
next = 0;
tmp = "";
} else if(isalpha(line[i])) {
tmp += line[i];
} else {
if(line[i] == '-') {
if(tmp.length() > 0 && i == len-1) {
next = 1;
}
} else {
cout << tmp;
tmp = "";
if(next)
cout << endl;
next = 0;
}
}
}
if(next == 0) {
cout << tmp;
if(next)
cout << endl;
tmp = "";
}
cout << endl;
}
return 0;
}