如何在大列表中搜索subString <String> -Java

Question

I have a big list<String> that is about 50,000 records. 我有一个很大的list<String> ，大约有50,000条记录。 I want an effective way to search for a specific subString in that List and get the Strings that contains that subString . 我想要一种有效的方法来搜索该List的特定subString并获取包含该subString的Strings 。

My code is like this so far: 到目前为止，我的代码是这样的：

List<String> result = new ArrayList<>();
if (aCondition) {
 for (String file : arg) {
   if (file.toLowerCase().contains(tag.toLowerCase())) {
     result.add(file);
    }
  }
} 
return result;

Answer 1

It depends on what you mean by effective . 这取决于您所说的有效。

If you want to get to "minimal" CPU usage, then there isn't much you can do: you have to iterate that list; 如果要达到“最小” CPU使用率，那么您就无能为力了。 and compare all entries. 并比较所有条目。 The only obvious thing to not do: call tag.toLowerCase() for each loop body. 唯一明显的事情不做：调用tag.toLowerCase()每个循环体。 Just compute that value once before entering the loop! 在进入循环之前，只需计算一次该值即可！

If you care about getting result in less time, the answer is simple: use multiple threads, and have each thread search a "slice" of the overall list (of course, that can turn complicated quickly, as you now have to preserve order and other subtle things). 如果您想在更短的时间内获得结果，答案很简单：使用多个线程，并让每个线程搜索整个列表的“切片”（当然，这很快就会变得很复杂，因为您现在必须保留顺序和其他细微的东西）。

Finally: you might want to look into tools such ElasticSearch - as there are various products designed to exactly do that: search huge amounts of text. 最后：你可能要考虑的工具， ElasticSearch -因为我们有专门正是这样做的各种产品：搜索文本的巨额资金。

Answer 2

Consider to use a SQL database to hold big amounts of data. 考虑使用SQL数据库保存大量数据。

In this way you can use a simple query to get a result String containing a substring (look at example below). 这样，您可以使用简单的查询来获取包含子字符串的结果String（请参见下面的示例）。 Furthermore your memory will be free of that amount of data loaded in list. 此外，您的内存将没有列表中加载的数据量。

eg 例如

SELECT * from word_list_table WHERE word LIKE'%substring%'

Answer 3

If your processor has more than one core just go and use parallel streams. 如果您的处理器有多个内核，那就去使用并行流。

List<String> result = lines.parallelStream() //convert list to parallel stream
            .filter(line -> file.toLowerCase().contains(tag.toLowerCase()))    // check your condition 
            .collect(Collectors.toList());     // collect output

The above code will process your strings faster if your processor has more than one core because a parallel stream is opened. 如果您的处理器具有多个内核，因为打开了并行流，则以上代码将更快地处理您的字符串。

如何在大列表中搜索subString <String> -Java

问题描述

3 个解决方案

解决方案1
1 已采纳 2017-06-27 10:32:03

解决方案2
0 2017-06-27 10:52:04

解决方案3
0 2017-06-27 11:17:32

如何在大列表中搜索subString <String> -Java

问题描述

3 个解决方案

解决方案1 1 已采纳 2017-06-27 10:32:03

解决方案2 0 2017-06-27 10:52:04

解决方案3 0 2017-06-27 11:17:32

解决方案1
1 已采纳 2017-06-27 10:32:03

解决方案2
0 2017-06-27 10:52:04

解决方案3
0 2017-06-27 11:17:32