[英]Efficient way of finding all strings of an Arraylist, which contains a substring
[英]Is there an efficient way to detect if a string contains a substring which is in a large set of characteristic strings?
例如,给定一个字符串aaaaaaaaaXyz
,我想知道它是否包含一个 substring ,它位于特征字符串集中{'xy','xyz','zzz','cccc','dddd',....}
,它可能有 100 万个成员。 有没有有效的方法?
鉴于您的搜索集可能非常大,我建议只迭代该集并检查潜在的 substring 匹配:
public boolean containsSubstring(String input, Set<String> subs) {
boolean match = false;
for (String sub : subs) {
if (input.contains(sub)) {
match = true;
break;
}
}
return match;
}
首先,你准备dictionary
。 像这样
Set<String> stringSet = Set.of("xy", "xyz", "zzz", "zzy", "cccc", "dddd");
Map<Character, List<String>> dictionary = new HashMap<>();
for (String word : stringSet)
dictionary.computeIfAbsent(word.charAt(0), k -> new ArrayList<>()).add(word);
System.out.println(dictionary);
output:
{c=[cccc], d=[dddd], x=[xyz, xy], z=[zzy, zzz]}
您可以使用此方法找出答案。
static boolean contains(String input, Map<Character, List<String>> dictionary) {
for (int i = 0, max = input.length(); i < max; ++i) {
char first = input.charAt(i);
if (dictionary.containsKey(first))
for (String word : dictionary.get(first))
if (input.startsWith(word, i))
return true;
}
return false;
}
在Clashsoft的提示下,我找到了 Aho-Corasick 算法的 java 实现,这是我想要的,感谢 Clashsoft
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.