简体   繁体   English

是否存在用于在文档中搜索列表中单词的标准方法?

[英]Is there a standard approach for searching a document for words in a list?

I'm working on an application that needs to search documents for all occurrences of words or phrases from a provided list. 我正在开发一个应用程序,该应用程序需要从提供的列表中搜索文档中所有出现的单词或短语。 This is fairly easy to do by just walking a pointer through the document but this method does not scale. 仅通过在文档中遍历指针就可以很容易地做到这一点,但是这种方法无法缩放。 As the document or dictionary get larger the search time increases proportionally eventually becoming unacceptable. 随着文档或字典变大,搜索时间成比例增加,最终变得无法接受。 We have experimented with various approaches to reduce the scaling penalty. 我们已经尝试了各种方法来减少缩放比例损失。 On such approach is building an index layer for the dictionary. 这种方法是为字典构建索引层。 This is our current best strategy but I'm wondering if there is something better and/or easier. 这是我们目前的最佳策略,但我想知道是否有更好和/或更容易的方法。

I'm sure this problem has been solved innumerable times. 我确信这个问题已经解决了无数次。 Is there an approach that has been shown to be optimal or at least close to optimal? 是否有一种方法被证明是最佳的或至少接近最佳的?

If you are having performance issues you probably want to start looking at projects specifically meant to deal with full text searching like Apache Solr. 如果遇到性能问题,您可能想开始研究专门用于处理全文搜索的项目,例如Apache Solr。

Solr allows you to feed it documents with which it builds an index to make searches for relevant keywords and phrases extremely fast. Solr允许您向其提供文档,并以此文档为基础构建索引,以使搜索相关的关键字和短语变得非常快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM