简体   繁体   中英

Way to factor in search locality in Solr/Elasticsearch/Sphinx?

My problem is to search data of thousands of users, eg mailboxes. Almost all the time search is filtered by user id. How this locality of searches could be taken into consideration? I'm trying to achieve performance comparable to a case where each user has dedicated index.

Sharding is not an option because it will be used (total number of users ~ 1M), and I'm looking for a solution to use inside a shard of ~4k users.

Well it can be done in Sphinx with Attributes. Most of the time can make the search more efficient by adding the user-id as a fake keyword too*. Then the documents can be filtered during the full-text stage. (still keep the attribute too, so as avoid possibility of manipulating results by constructing a careful query to return results from other users)

  • eg, add _user1234 as a full-text field, then add to query WHERE MATCH('example _user1234') AND user = 1234 then finds documents just from that user.

One possible solution is to group documents of the same user in inverted index block. Given that inverted index block is sorted by document id, such grouping can be done only by assigning ids to documents appropriately. Same user's documents should have monotonic ids. There could be minor violations of this rule - it would not harm performance significantly.

Implementations.

index sorting having just become a first-class citizen in Lucene 6.21

Could be achieved in elasticsearch 2.3 (see here ). And I think it's achievable in Solr in the same way.

As for sphinx, I suppose the same technique of assigning monotonic document ids should work.

For more technical reasoning see previous link.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM