[英]Longest Common Substring using Suffix Automata
I used to calculate longest common Substring using dynamic programming O(m * n) , suffix tree O(m + n) , suffix array O(nlog^2 n) according to my need. 我曾经根据需要使用动态编程O(m * n) ,后缀树O(m + n)和后缀数组O(nlog ^ 2 n)计算最长的公共子字符串。 Recently I have learnt Suffix Automaton which performs in O(n) which is very impressive. 最近,我了解到后缀自动机在O(n)中的表现令人印象深刻。
I can write the code by which I can calculate the length of longest common substring easily. 我可以编写代码来轻松计算最长的公共子字符串的长度。 For example: 例如:
Input:
abcdef
xyzabc
Output:
3
And this is the code: 这是代码:
#include <bits/stdc++.h>
using namespace std;
const int maxN = 250500;
const int maxState = maxN << 1;
struct State {
State *go[26], *suffix;
int depth, id;
long long cnt;
};
State pool[maxState], *point, *root, *sink;
int size;
State *newState(int dep) {
point->id = size++;
point->depth = dep;
return point++;
}
void init() {
point = pool;
size = 0;
root = sink = newState(0);
}
void insert(int a) {
State *p = newState(sink->depth+1);
State *cur = sink, *sufState;
while (cur && !cur->go[a]) {
cur->go[a] = p;
cur = cur->suffix;
}
if (!cur)
sufState = root;
else {
State *q = cur->go[a];
if (q->depth == cur->depth + 1)
sufState = q;
else {
State *r = newState(cur->depth+1);
memcpy(r->go, q->go, sizeof(q->go));
r->suffix = q->suffix;
q->suffix = r;
sufState = r;
while (cur && cur->go[a] == q) {
cur->go[a] = r;
cur = cur->suffix;
}
}
}
p->suffix = sufState;
sink = p;
}
int work(char buf[]) {
//printf("%s", buf);
int len = strlen(buf);
int tmp = 0, ans = 0;
State *cur = root;
for (int i = 0; i < len; i++) {
if (cur->go[buf[i]-'a']) {
tmp++;
cur = cur->go[buf[i]-'a'];
} else {
while (cur && !cur->go[buf[i]-'a'])
cur = cur->suffix;
if (!cur) {
cur = root;
tmp = 0;
} else {
tmp = cur->depth + 1;
cur = cur->go[buf[i]-'a'];
}
}
ans = max(ans, tmp);
}
return ans;
}
char ch[maxN];
int main() {
scanf("%s", ch);
init();
int len = strlen(ch);
for (int i = 0; i < len; i++)
insert(ch[i]-'a');
scanf("%s", ch);
printf("%d\n", work(ch));
return 0;
}
But now I need to print the longest Common Substring itself, not the length. 但是现在我需要打印最长的Common Substring本身,而不是长度。 But I can't be able to modify my code :( How this code can be modified to print the longest common Substring? 但是我无法修改我的代码:(如何修改此代码以打印最长的公共子字符串?
When you are at this line: 当您在此行时:
ans = max(ans, tmp);
The starting position in buf
that achieved depth tmp
was i - tmp + 1
. buf
中达到深度tmp
的起始位置为i - tmp + 1
。 Now you know the positions of all longest common substrings in the second string. 现在,您知道了第二个字符串中所有最长的公共子字符串的位置。 Just pick any and output the result. 只需选择任何一个并输出结果即可。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.