简体   繁体   English

使用后缀自动机的最长公共子串

[英]Longest Common Substring using Suffix Automata

I used to calculate longest common Substring using dynamic programming O(m * n) , suffix tree O(m + n) , suffix array O(nlog^2 n) according to my need. 我曾经根据需要使用动态编程O(m * n) ,后缀树O(m + n)和后缀数组O(nlog ^ 2 n)计算最长的公共子字符串。 Recently I have learnt Suffix Automaton which performs in O(n) which is very impressive. 最近,我了解到后缀自动机O(n)中的表现令人印象深刻。

I can write the code by which I can calculate the length of longest common substring easily. 我可以编写代码来轻松计算最长的公共子字符串的长度。 For example: 例如:

Input:
abcdef
xyzabc

Output:
3

And this is the code: 这是代码:

#include <bits/stdc++.h>
using namespace std;

const int maxN = 250500;
const int maxState = maxN << 1;

struct State {
    State *go[26], *suffix;
    int depth, id;
    long long cnt;
};
State pool[maxState], *point, *root, *sink;
int size;

State *newState(int dep) {
    point->id = size++;
    point->depth = dep;
    return point++;
}

void init() {
    point = pool;
    size = 0;
    root = sink = newState(0);
}

void insert(int a) {
    State *p = newState(sink->depth+1);
    State *cur = sink, *sufState;
    while (cur && !cur->go[a]) {
        cur->go[a] = p;
        cur = cur->suffix;
    }
    if (!cur)
        sufState = root;
    else {
        State *q = cur->go[a];
        if (q->depth == cur->depth + 1)
            sufState = q;
        else {
            State *r = newState(cur->depth+1);
            memcpy(r->go, q->go, sizeof(q->go));
            r->suffix = q->suffix;
            q->suffix = r;
            sufState = r;
            while (cur && cur->go[a] == q) {
                cur->go[a] = r;
                cur = cur->suffix;
            }
        }
    }
    p->suffix = sufState;
    sink = p;
}

int work(char buf[]) {
    //printf("%s", buf);
    int len = strlen(buf);
    int tmp = 0, ans = 0;
    State *cur = root;
    for (int i = 0; i < len; i++) {
        if (cur->go[buf[i]-'a']) {
            tmp++;
            cur = cur->go[buf[i]-'a'];
        } else {
            while (cur && !cur->go[buf[i]-'a'])
                cur = cur->suffix;
            if (!cur) {
                cur = root;
                tmp = 0;
            } else {
                tmp = cur->depth + 1;
                cur = cur->go[buf[i]-'a'];
            }
        }
        ans = max(ans, tmp);

    }
    return ans;
}

char ch[maxN];

int main() {
    scanf("%s", ch);
    init();
    int len = strlen(ch);
    for (int i = 0; i < len; i++)
        insert(ch[i]-'a');
    scanf("%s", ch);
    printf("%d\n", work(ch));
    return 0;
}

But now I need to print the longest Common Substring itself, not the length. 但是现在我需要打印最长的Common Substring本身,而不是长度。 But I can't be able to modify my code :( How this code can be modified to print the longest common Substring? 但是我无法修改我的代码:(如何修改此代码以打印最长的公共子字符串?

When you are at this line: 当您在此行时:

ans = max(ans, tmp);

The starting position in buf that achieved depth tmp was i - tmp + 1 . buf中达到深度tmp的起始位置为i - tmp + 1 Now you know the positions of all longest common substrings in the second string. 现在,您知道了第二个字符串中所有最长的公共子字符串的位置。 Just pick any and output the result. 只需选择任何一个并输出结果即可。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM