简体   繁体   English

如何在Solr / Elasticsearch / Sphinx中考虑搜索局部性?

[英]Way to factor in search locality in Solr/Elasticsearch/Sphinx?

My problem is to search data of thousands of users, eg mailboxes. 我的问题是搜索成千上万个用户的数据,例如邮箱。 Almost all the time search is filtered by user id. 几乎所有时间搜索都是通过用户ID过滤的。 How this locality of searches could be taken into consideration? 如何考虑搜索的本地性? I'm trying to achieve performance comparable to a case where each user has dedicated index. 我正在尝试实现与每个用户都有专用索引的情况相当的性能。

Sharding is not an option because it will be used (total number of users ~ 1M), and I'm looking for a solution to use inside a shard of ~4k users. 分片不是一种选择,因为它将被使用(用户总数〜1M),并且我正在寻找在约4k用户的分片内使用的解决方案。

Well it can be done in Sphinx with Attributes. 好吧,可以在带有属性的Sphinx中完成。 Most of the time can make the search more efficient by adding the user-id as a fake keyword too*. 在大多数情况下,也可以通过将用户ID添加为假关键字来提高搜索效率*。 Then the documents can be filtered during the full-text stage. 然后,可以在全文阶段过滤文档。 (still keep the attribute too, so as avoid possibility of manipulating results by constructing a careful query to return results from other users) (也仍然保留该属性,以免通过构造一个仔细的查询以从其他用户返回结果来避免操纵结果的可能性)

  • eg, add _user1234 as a full-text field, then add to query WHERE MATCH('example _user1234') AND user = 1234 then finds documents just from that user. 例如,将_user1234添加为全文字段,然后添加到查询WHERE MATCH('example _user1234') AND user = 1234然后仅从该用户中查找文档。

One possible solution is to group documents of the same user in inverted index block. 一种可能的解决方案是将同一用户的文档分组在倒排索引块中。 Given that inverted index block is sorted by document id, such grouping can be done only by assigning ids to documents appropriately. 假定反向索引块按文档ID排序,则只能通过将ID适当地分配给文档来完成这种分组。 Same user's documents should have monotonic ids. 同一用户的文档应具有单调ID。 There could be minor violations of this rule - it would not harm performance significantly. 可能会轻微违反此规则-不会显着损害性能。

Implementations. 实现。

index sorting having just become a first-class citizen in Lucene 6.21 索引排序刚刚成为Lucene 6.21中的一等公民

Could be achieved in elasticsearch 2.3 (see here ). 可以在elasticsearch 2.3中实现(请参阅此处 )。 And I think it's achievable in Solr in the same way. 而且我认为以相同的方式在Solr中可以实现。

As for sphinx, I suppose the same technique of assigning monotonic document ids should work. 至于狮身人面像,我认为分配单调文档ID的相同技术应该起作用。

For more technical reasoning see previous link. 有关更多技术推理,请参阅上一个链接。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM