[英]Efficient way to test if a string is substring of any in a list of strings
I want to know the best way to compare a string to a list of strings. 我想知道将字符串与字符串列表进行比较的最佳方法。 Here is the code I have in my mind, but it's clear that it's not good in terms of time complexity. 这是我想到的代码,但是很显然,在时间复杂度方面不好。
for (String large : list1) {
for (String small : list2) {
if (large.contains(small)) {
// DO SOMETHING
} else {
// NOT FOR ME
}
}
// FURTHER MANIPULATION OF STRING
}
Both lists of strings can contain more than thousand values, so the worst case complexity can rise to 1000×1000×length which is a mess. 两个字符串列表都可以包含上千个值,因此最坏情况下的复杂度可能会增加到1000×1000×length,这很混乱。 I want to know the best way to perform the task of comparing a string with a list of strings, in the given scenario above. 我想知道在上述给定情况下执行将字符串与字符串列表进行比较的最佳方法。
You could just do this: 您可以这样做:
for (String small : list2) {
if (set1.contains(small)) {
// DO SOMETHING
} else {
// NOT FOR ME
}
}
set1 should be the larger list of String, and instead of keeping it as a List<String>
, use a Set<String>
or a HashSet<String>
set1应该是String的较大列表,而不是将其保留为List<String>
,而应使用Set<String>
或HashSet<String>
Thanks to the first answer by sandeep. 感谢sandeep的第一个回答。 Here is the solution: 解决方法如下:
List<String> firstCollection = new ArrayList<>();
Set<String> secondCollection = new HashSet<>();
//POPULATE BOTH LISTS HERE.
for(String string: firstCollection){
if(secondCollection.contains(string)){
//YES, THE STRING IS THERE IN THE SECOND LIST
}else{
//NOPE, THE STRING IS NOT THERE IN THE SECOND LIST
}
}
This is, unfortunately, a difficult and messy problem. 不幸的是,这是一个困难而混乱的问题。 It's because you're checking whether a small string is a substring of a bunch of large strings, instead of checking that the small string is equal to a bunch of large strings. 这是因为您要检查小字符串是否是一堆大字符串的子字符串,而不是检查小字符串是否等于一堆大字符串。
The best solution depends on exactly what problem you need to solve, but here is a reasonable first attempt: 最佳解决方案取决于您到底需要解决什么问题,但这是一个合理的尝试:
In a temporary place, concatenate all the large strings together, then construct a suffix tree on this long concatenated string. 在一个临时位置,将所有大字符串连接在一起,然后在这个长的连接字符串上构造一个后缀树 。 With this structure, we should be able to find all the substring matches of any given small
among all the large
quickly. 有了这个结构,我们应该能够找到任何给定的所有子字符串匹配small
间所有的large
快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.