[UVA][bitmask] 12515 - Movie Police
Movie Police
Movie Police |
Movie Police (MP) is an international top secret law enforcement agency, controlling illegal movie downloads on internet. With their elite team of programmers, MP has developed a very smart algorithm to produce movie signatures. A movie signature is a binary string, one bit for every frame in the movie, so that the i-th bit in the signature corresponds to the i-th frame in the movie. The algorithm is so amazing that it outputs consistently the same signature for versions of the same film with different quality resolutions. One application of this revolutionary technology uses it to detect if a small clip is part of a movie, looking for a high similarity between the clip signature and the movie signature.
Now MP has started to apply this technology and, as a first step, a massive online
database of movie signatures was already built. As a new member of the MP crew, you must write a
program that, given the signature of a clip, finds the index in the MP database of a movie whose
signature matches the clip signature at most. That is, a movie whose signature has a
substring, of the same length of the clip signature, that is most similar to the clip
signature.
Similarity between strings of the same length is defined by means of their Hamming
distance (number of bits that do not match), so that ``more similar'' means ``less Hamming
distance''.
Input
The first line of the input contains two positive integer numbers M and Q, separated
by a blank, where M indicates the number of movie signatures in the database and Q indicates the
number of clip signatures to process (
1 M
1000,
1
Q
500). Each one
of the following M lines contains a binary string si describing the i-th movie signature in
the database. You may suppose that si has length li, where
1
li
100. Finally,
there are Q lines, each one with a binary string that corresponds to a clip signature to search
for maximal similarity in the database. You may assume that, for every clip signature to be
searched, there is at least one movie signature in the database whose length is greater or equal
than the clip's length.
Output
For each clip signature given in the input, output a single line with the lowest index i
of a movie si (
1 i
M) that matches the clip at most, as above explained. If there
are two movie signatures that match the clip signature maximally, answer the one with lower index in
the database.
Sample Input
3 1 000011 1101111111 1111100000 1000111
Sample Output
2
題目描述:
現在有很多電影的片段,以及驗證版權碼,對於每個電影片段,找到其中一段 substring 去對應驗證碼,去計算漢明距離,即有多少 bit 是不同的,找一個漢明距離最小的電影片段。
題目解法:
對於一組詢問,會消耗 O(M*len*len),每個電影抓出來進行比較,又去切割 substring,要跟驗證碼一樣長,然後還要去比較。這樣很容易 TLE,但至少也要跑 O(M) 次,只能將比較次數往下壓。
於是將 32 bit 做一次 mask,藉由 xor 得到一個整數,在 32 bit 數字中得到 1 個個數
使用 __builtin_popcount 涵式去計算 1 的個數,因次一次可以跳 32 的次數。
基本上比較的次數將會變成 O(len/32) => 總共是 O(M*len*len/32)。
不可能的解就跳出。
#include <stdio.h>
#include <iostream>
#include <algorithm>
#include <string.h>
using namespace std;
char s[1005][105], ss[105];
unsigned int v[1005][5];
int dlen[1005];
int main() {
int n, q;
int i, j, k;
while(scanf("%d %d", &n, &q) == 2) {
int i, j, k;
for(i = 0; i < n; i++) {
scanf("%s", s[i]);
dlen[i] = strlen(s[i]);
}
memset(v, 0, sizeof(v));
for(i = 0; i < n; i++) {
for(j = 0, k = 0; j < dlen[i]; j++) {
v[i][k] = (v[i][k]<<1)|(s[i][j]-'0');
if(j%32 == 31) k++;
}
}
while(q--) {
scanf("%s", ss);
int len = strlen(ss);
int vv[105] = {};
for(i = 0; i < len; i++) {
for(j = 0; j < 32 && i+j < len; j++) {
vv[i] = (vv[i]<<1)|(ss[i+j]-'0');
}
}
int mn = 0xfffffff, ret = 0;
int p, q, diff, val;
for(i = 0; i < n; i++) {
int submn = 0xffffff;
for(j = 0; j+len <= dlen[i]; j++) {
p = j, q = 0;
diff = 0;
while(q < len) {
if(q+32 <= len && p%32 == 0 && p+32 <= dlen[i]) {
val = vv[q]^v[i][p/32];
diff += __builtin_popcount(val);
p += 32, q += 32;
} else {
diff += (ss[q] != s[i][p]);
p++, q++;
}
if(diff >= mn) break;
}
submn = min(submn, diff);
}
if(submn < mn) {
mn = submn, ret = i;
}
}
//printf("%d\n", mn);
printf("%d\n", ret+1);
}
}
return 0;
}
/*
2 1
0000000000000000000000000000000000000000000000000000000000000000
1111111111111111111111111111111111111111111111111111111111111111
1000000000000000000000000000000000000000000000000000000000000000
*/
// __builtin_popcount