简体   繁体   English

如何找到最长的公共 substring

[英]How to find the longest common substring

What is the best available algorithm to search the longest common substring?搜索最长公共 substring 的最佳可用算法是什么? Strings contains 16000+ characters and alphabet is ACDT.字符串包含 16000 多个字符,字母表是 ACDT。

  1. Boyer–Moore–Horspool - time taken is very high Boyer–Moore–Horspool - 耗时非常长
  2. Rabin-Karp - worst than 1st one Rabin-Karp - 比第一个差
  3. Suffix tree - 2d array memory overflow后缀树 - 二维数组 memory 溢出

Any other methods or modifications?还有其他方法或修改吗? Actually I want to calculate the average common substring of two genomes.其实我想计算两个基因组的平均公共 substring。

Gnomes or genomes?! 地精还是基因组?

See here . 这里 Dynamic programming may be the route to take? 动态编程可能是走的路线?

Also note that the first two algorithms you've listed are for string searching. 还要注意,您列出的前两个算法是用于字符串搜索的。

public class LongestCommonSubS {
    public static void main(String[] args) throws IOException {
        String str1 = "Koushikpaul";
        String str2 = "asdfgoushqwertikpauzxcv";
        StringBuilder sub1 = new StringBuilder();
        for (int i = 0; i < str1.length(); i++) {
            StringBuilder sub2 = new StringBuilder();
            int k = 0;
            for (int j = 0; j < str2.length() & (i + k) < str1.length(); j++) {
                if (str1.charAt(i + k) == str2.charAt(j)) {
                    sub2.append(str2.charAt(j));
                    k++;
                } else if (k > 0 & str1.charAt(i + k) != str2.charAt(j))break;
            }
            if (sub2.length() > sub1.length())sub1 = sub2;
        }
        System.out.println(sub1);
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM