简体繁体 English

是否可以使用 KMP 算法找到最长的子串？

[英]Is it possible to use the KMP algorithm to find a longest substring?

原文 2014-04-07 03:32:20 6 2 string/ algorithm/ pattern-matching/ knuth-morris-pratt

Suppose I have a pattern P and some text T, and I want to find the largest prefix of P that matches a substring of T. Is it possible to modify the KMP algorithm to do such an operation?假设我有一个模式 P 和一些文本 T，我想找到 P 的最大前缀与 T 的子字符串匹配。是否可以修改 KMP 算法来执行这样的操作？ (If I remember correctly, the KMP algorithm does partial matches, but I am interested in the longest possible match). （如果我没记错的话，KMP 算法会进行部分匹配，但我对可能的最长匹配感兴趣）。

2 个解决方案

As KMP is scanning the text, the state of the KMP shows the length of the longest prefix of the pattern that matches the text up to the current point, so you could record the maximum length seen and the point in the pattern at which it was seen, and it does look like you could use this to find a longest matching prefix of P.由于 KMP 正在扫描文本，KMP 的状态显示了与文本匹配的模式的最长前缀的长度，直到当前点，因此您可以记录看到的最大长度和模式中的点看到了，看起来确实可以使用它来查找 P 的最长匹配前缀。

Another way of doing this would be to put all prefixes of P into Aho-Corasick.另一种方法是将 P 的所有前缀放入 Aho-Corasick。 The run-time behaviour would be very similar, although it would consume a little more store.运行时行为将非常相似，尽管它会消耗更多的存储。 It would allow you to use an existing library - if you had one for Aho-Corasick, instead of modifying a KMP implementation.它将允许您使用现有库 - 如果您有一个用于 Aho-Corasick 的库，而不是修改 KMP 实现。

Actually it is a typical scenario of the so called "extended-KMP".实际上这是所谓的“extended-KMP”的典型场景。

See the sample code here andhere .请参阅此处和此处的示例代码。