简体繁体 English

如何使倒排索引搜索更快？

[英]How to make inverted index search faster?

原文 2012-01-26 05:43:31 2 1 algorithm/ search/ full-text-search/ parallel-processing/ information-retrieval

I am designing an architecture of full-text search engine. 我正在设计一个全文搜索引擎的架构。 One of the points is processing queries among large datasets with few response time. 其中一点是处理大型数据集中的查询，响应时间很短。 One thing I could figure out is that to split the inverted index into partitions. 我能想到的一件事是将反向索引拆分为分区。 There are 2 strategies for this: term-based partition and document-based partition. 有两种策略：基于术语的分区和基于文档的分区。 But I really want to know if there is any other way to make inverted search faster among large datasets? 但我真的想知道是否有其他方法可以在大型数据集中更快地进行倒置搜索？

1 个解决方案

This video is a speech with Shay Banon, the developer of ElasticSearch a distributed full-text search engine. 该视频是与Shay Banon的演讲， Shalastic Banon是ElasticSearch的开发者，是一个分布式全文搜索引擎。 In the video he discusses the pros and cons of term-based partition and document-based partition. 在视频中，他讨论了基于术语的分区和基于文档的分区的优缺点。

Basically, term-based partition produces too much network bandwidth between processes/nodes. 基本上，基于术语的分区在进程/节点之间产生过多的网络带宽。 And it is harder to implement nicely. 并且很难很好地实现。 Document-based is extremely simpler to implement and produce results. 基于文档的实现和生成结果非常简单。

Moreover, in this lecture by Jeffrey Dean he also explains the differences and says that Google uses document-based partition. 此外，在Jeffrey Dean的这个讲座中，他还解释了这些差异，并表示Google使用基于文档的分区。

This is the two main ways to distribute your search engine. 这是分发搜索引擎的两种主要方式。 I'm not aware of other ways of doing it. 我不知道其他方法。 Anyway you may want to search the Information Retrieval literature for novel work on the subject. 无论如何，您可能希望在信息检索文献中搜索有关该主题的新颖工作。