在Java中的子字符串中搜索字符串的最快方法？

Question

Currently I am working on a Kattis problem, "String Matching"( https://open.kattis.com/problems/stringmatching ).目前我正在研究 Kattis 问题“字符串匹配”（ https://open.kattis.com/problems/stringmatching ）。 I am getting correct output for my program, however since the file is so big, and the time limit for completing the problem is 2 seconds I keep getting the "Time Limit Exceeded" error on Kattis.我的程序得到了正确的输出，但是由于文件太大，完成问题的时间限制是 2 秒，我一直在 Kattis 上收到“超出时间限制”错误。 I've attempted two ways to solve the problem and the second test case exceeds my time limit on both.我尝试了两种方法来解决问题，第二个测试用例超出了我的时间限制。 Here is what I have done:这是我所做的：

    while (sc.hasNext()) {

        String pattern = sc.nextLine();
        String text = sc.nextLine();

        for(int i = 0; i < text.length()-pattern.length()+1; i++) {
            if(text.regionMatches(i,  pattern, 0, pattern.length())) {
                System.out.print(i + " ");
            }
        }
        System.out.println();
    }

I have also tried it this way:我也试过这种方式：

    while(sc.hasNext()) {

        String pattern = sc.nextLine();
        String text = sc.nextLine();

        for(int i = 0; i < text.length()-pattern.length()+1; i++) {
            if(pattern.equals(text.substring(i, i+pattern.length()))) {
                System.out.print(i + " ");
            }
        }   
    System.out.println();
}

What is a faster way to take a String and compare it to see if it exists in a larger String?获取字符串并比较它以查看它是否存在于更大的字符串中的更快方法是什么？

Answer 1

As per your comment on getting the positions, it's pretty straightforward with a regex matcher.根据您对获取职位的评论，使用正则表达式匹配器非常简单。 See below:见下文：

Pattern p = Pattern.compile("your regex pattern here");
Matcher m = p.matcher("the string you're testing");
if (m.find()) {
    int start = m.start();
    int end = m.end();

    // do something
}

EDIT: A working example below.编辑：下面的一个工作示例。

public static void main(String...args) throws Exception {
    printMatchInfo("\\w+", "The quick brown fox jumps over the lazy dog");
}

private static void printMatchInfo(String regex, String input) {
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(input);
    System.out.println(String.format("Checking %s against %s", input, regex)); 
    while (m.find()) {
        System.out.println(String.format("Found %s with (start, end) (%d, %d)", m.group(), m.start(), m.end()));
    }
}

Output输出

Checking The quick brown fox jumps over the lazy dog against \w+
Found The with (start, end) (0, 3)
Found quick with (start, end) (4, 9)
Found brown with (start, end) (10, 15)

...

Answer 2

The fastest solution for case of massive data would give Suffix Tree or Suffix Array海量数据的最快解决方案是给后缀树或后缀数组

Naive implementation of suffix tree is pretty simple, you can check my solution here: https://github.com/sergpank/lcs/blob/master/src/MonoMain.java后缀树的天真实现非常简单，您可以在此处查看我的解决方案： https : //github.com/sergpank/lcs/blob/master/src/MonoMain.java

If you need to build Suffix Tree faster, try Ukkonen Algorythm or any other advanced data structure.如果您需要更快地构建后缀树，请尝试使用 Ukkonen 算法或任何其他高级数据结构。

在Java中的子字符串中搜索字符串的最快方法？

问题描述

2 个解决方案

解决方案1
0 2018-02-12 19:54:10

解决方案2
0 2018-02-12 20:07:50

在Java中的子字符串中搜索字符串的最快方法？

问题描述

2 个解决方案

解决方案1 0 2018-02-12 19:54:10

解决方案2 0 2018-02-12 20:07:50

解决方案1
0 2018-02-12 19:54:10

解决方案2
0 2018-02-12 20:07:50