通過使用Java集合中的通配符返回字符串列表的最快方法

Question

我設置了100000 String。 例如，我想從該集合中獲取所有以“ JO”開頭的字符串。 最好的解決方案是什么？

Answer 1

如果您希望所有字符串都以序列開頭，則可以將所有String添加到類似TreeSet的NavigableSet中，並獲得subSet(text, text+'\')將給您所有以text開頭的條目。此查詢為O（log n ）

如果希望所有的字符串都以序列結尾，則可以執行類似的操作，除了必須反轉字符串。 在這種情況下，從反向字符串到正向字符串的TreeMap將是更好的結構。

如果要“ x * z”，則可以搜索第一組並與Map的值合並。

如果要包含“ x ”，則可以使用Navigable <String，Set <String >>，其中鍵是從第一個，第二個，第三個字符開始的每個String等。該值是一個Set，因為您可以獲取重復項。 您可以進行類似結構開頭的搜索。

Answer 2

這是一個自定義匹配器類，該類無需進行正則表達式即可進行匹配（它僅在構造函數中使用regex，以更准確地說明它）並支持通配符匹配：

public class WildCardMatcher {
    private Iterable<String> patternParts;
    private boolean openStart;
    private boolean openEnd;

    public WildCardMatcher(final String pattern) {
        final List<String> tmpList = new ArrayList<String>(
                                     Arrays.asList(pattern.split("\\*")));
        while (tmpList.remove("")) { /* remove empty Strings */ }
        // these last two lines can be made a lot simpler using a Guava Joiner
        if (tmpList.isEmpty())
            throw new IllegalArgumentException("Invalid pattern");
        patternParts = tmpList;
        openStart = pattern.startsWith("*");
        openEnd = pattern.endsWith("*");
    }

    public boolean matches(final String item) {
        int index = -1;
        int nextIndex = -1;
        final Iterator<String> it = patternParts.iterator();
        if (it.hasNext()) {
            String part = it.next();
            index = item.indexOf(part);
            if (index < 0 || (index > 0 && !openStart))
                return false;
            nextIndex = index + part.length();
            while (it.hasNext()) {
                part = it.next();
                index = item.indexOf(part, nextIndex);
                if (index < 0)
                    return false;
                nextIndex = index + part.length();
            }
            if (nextIndex < item.length())
                return openEnd;
        }
        return true;
    }

}

這是一些測試代碼：

public static void main(final String[] args) throws Exception {
    testMatch("foo*bar", "foobar", "foo123bar", "foo*bar", "foobarandsomethingelse");
    testMatch("*.*", "somefile.doc", "somefile", ".doc", "somefile.");
    testMatch("pe*", "peter", "antipeter");
}

private static void testMatch(final String pattern, final String... words) {
    final WildCardMatcher matcher = new WildCardMatcher(pattern);
    for (final String word : words) {
        System.out.println("Pattern " + pattern + " matches word '"
                          + word + "': " + matcher.matches(word));
    }
}

輸出：

Pattern foo*bar matches word 'foobar': true
Pattern foo*bar matches word 'foo123bar': true
Pattern foo*bar matches word 'foo*bar': true
Pattern foo*bar matches word 'foobarandsomethingelse': false
Pattern *.* matches word 'somefile.doc': true
Pattern *.* matches word 'somefile': false
Pattern *.* matches word '.doc': true
Pattern *.* matches word 'somefile.': true
Pattern pe* matches word 'peter': true
Pattern pe* matches word 'antipeter': false

盡管這還遠遠不能投入生產，但它應該足夠快，並且支持多個通配符（包括開頭和結尾）。 但是，當然，如果您的通配符僅在末尾，請使用彼得的答案（+1）。

通過使用Java集合中的通配符返回字符串列表的最快方法

問題描述

2 個解決方案

解決方案1
10 已采納 2011-05-10 15:41:24

解決方案2
2 2011-05-10 16:08:33

通過使用Java集合中的通配符返回字符串列表的最快方法

問題描述

2 個解決方案

解決方案1 10 已采納 2011-05-10 15:41:24

解決方案2 2 2011-05-10 16:08:33

解決方案1
10 已采納 2011-05-10 15:41:24

解決方案2
2 2011-05-10 16:08:33