简体繁体 English

在SOLR或Elasticsearch中排除记录的好方法

[英]Good way to exclude records in SOLR or Elasticsearch

原文 2017-04-28 06:21:59 1 2 elasticsearch/ solr/ cassandra/ sphinx

For a matchmaking portal, we have one requirement where in, if a customer viewed complete profile details of a bride or groom then we have to exclude that profile from further search results. 对于婚介门户网站，我们有一个要求，即如果客户查看了新娘或新郎的完整个人资料详细信息，则必须从进一步的搜索结果中排除该个人资料。 Currently, along with other detail we are storing the viewed profile ids in a field (Comma Separated) against that bride or groom's details. 当前，连同其他详细信息，我们正在将查看的个人资料ID存储在该新娘或新郎详细信息的字段（逗号分隔）中。

Eg., if A viewed B, then in B's record under the field saw_me we will add A (comma separated). 例如，如果A查看了B，则在B的记录saw_me下的字段中，我们将添加A（逗号分隔）。

while searching let say the currently searching members id is 123456 then we will fire a query like 在搜索时，假设当前搜索的会员ID为123456，那么我们将触发一个查询，例如

Select * from profiledetails where (OTHER CON) AND 123456 not in saw_me; 从配置文件详细信息中选择*，其中（OTHER CON）和123456不在saw_me中；

The problem here is the saw_me field value is growing like anything, is there any better way to handle this requirement? 这里的问题是saw_me字段值正以任何形式增长，是否有更好的方法来处理此要求？ Please guide. 请指导。

2 个解决方案

If this is using Solr: 如果使用Solr：

first, DON'T add the 'AND NOT ...' clauses along with the main query in q param, add them to fq. 首先，不要在q参数中添加'AND NOT ...'子句以及主查询，而是将它们添加到fq中。 This have many benefits (the fq will be cached) 这有很多好处（fq将被缓存）
Until you get to a list of values that is maybe 1000s this approach is simple and should work fine 在获得可能为1000的值列表之前，此方法很简单并且应该可以正常工作
After you reach a point where the list is huge, maybe it time to move to a post filter with a high cost ( so it is looked up last). 在到达列表庞大的地步之后，也许是时候转移到成本较高的后置过滤器了（因此最后查找它）。 This would look up docs to remove in an external source (redis, db...). 这将查找要从外部源（redis，db ...）中删除的文档。

In my opinion no matter how much the saw_me field grows, it will not make much difference in search time.Because tokens are indexed inversely and doc_values are created at index time in column major fashion for efficient read and has support for caching from OS. 我认为无论awk_me字段增长多少，搜索时间都不会有太大变化。因为令牌被反向索引，并且在索引时间以列主要方式创建doc_values，以实现高效读取并支持从OS缓存。
ES handles these things for you efficiently. ES可以为您高效地处理这些事情。