简体   繁体   English

在多个文件中进行全文搜索的最佳方法

[英]Best way of full text search in multiple files

I am developing a module for full text search in about 1,000,000 files (each file is less than 500 KB, and search conditions such as AND OR, should be available to be applied to each file), and there is a chance that I can upload all the files onto Dictionary or List <string> objects(in-memory) when the app is started. 我正在开发一个模块,用于约1,000,000个文件的全文搜索(每个文件小于500 KB,并且应将AND(或)之类的搜索条件应用于每个文件),并且有机会上传启动应用程序时,将所有文件放到Dictionary或List <string>对象(内存中)中。

I am considering the following strategies. 我正在考虑以下策略。

1) List <string> (local in-memory) : Parallel.foreach and apply a regex (as I need indexes of search words...) for each string. 1)列出<string> (本地内存):Parallel.foreach并为每个字符串应用一个正则表达式(因为我需要搜索词的索引...)。

2) Open source : Lucene 2)开源:Lucene

3) Open source : Elastic search 3)开源:弹性搜索

4) Open source : Yara ( I am aware that it is for detecting malwares. A developer recommended it for me. It would be appreciated if anyone could let me know the details about it. https://github.com/stellarbear/YaraSharp ) 4)开源:Yara(我知道它用于检测恶意软件。开发人员为我推荐了它。如果有人可以让我知道有关它的详细信息,将不胜感激。https://github.com/stellarbear/YaraSharp

5) Redis or DB (This seems slower than 1)) 5)Redis或DB(这似乎比1慢))

Which one is the fastest? 哪一个最快? or is there any other strategies? 还是还有其他策略?

Your question is very general. 您的问题很笼统。 Because I don't know all the parts of your case is very difficult to answer. 因为我不知道您案件的所有部分都很难回答。 Anyway. 无论如何。 I vote for elasticsearch. 我投票支持elasticsearch。 You will benefit a very wide amount of options for analyzing and discover your text files out of the box. 开箱即用,您将受益于大量的分析和发现文本文件的选项。 I think from all the solutions you speaking about it's will be the easest way... 我认为,从您所说的所有解决方案中,这将是最简单的方法...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM