简体   繁体   English

在SOLR或Elasticsearch中排除记录的好方法

[英]Good way to exclude records in SOLR or Elasticsearch

For a matchmaking portal, we have one requirement where in, if a customer viewed complete profile details of a bride or groom then we have to exclude that profile from further search results. 对于婚介门户网站,我们有一个要求,即如果客户查看了新娘或新郎的完整个人资料详细信息,则必须从进一步的搜索结果中排除该个人资料。 Currently, along with other detail we are storing the viewed profile ids in a field (Comma Separated) against that bride or groom's details. 当前,连同其他详细信息,我们正在将查看的个人资料ID存储在该新娘或新郎详细信息的字段(逗号分隔)中。

Eg., if A viewed B, then in B's record under the field saw_me we will add A (comma separated). 例如,如果A查看了B,则在B的记录saw_me下的字段中,我们将添加A(逗号分隔)。

while searching let say the currently searching members id is 123456 then we will fire a query like 在搜索时,假设当前搜索的会员ID为123456,那么我们将触发一个查询,例如

Select * from profiledetails where (OTHER CON) AND 123456 not in saw_me; 从配置文件详细信息中选择*,其中(OTHER CON)和123456不在saw_me中;

The problem here is the saw_me field value is growing like anything, is there any better way to handle this requirement? 这里的问题是saw_me字段值正以任何形式增长,是否有更好的方法来处理此要求? Please guide. 请指导。

If this is using Solr: 如果使用Solr:

  1. first, DON'T add the 'AND NOT ...' clauses along with the main query in q param, add them to fq. 首先,不要在q参数中添加'AND NOT ...'子句以及主查询,而是将它们添加到fq中。 This have many benefits (the fq will be cached) 这有很多好处(fq将被缓存)
  2. Until you get to a list of values that is maybe 1000s this approach is simple and should work fine 在获得可能为1000的值列表之前,此方法很简单并且应该可以正常工作
  3. After you reach a point where the list is huge, maybe it time to move to a post filter with a high cost ( so it is looked up last). 在到达列表庞大的地步之后,也许是时候转移到成本较高的后置过滤器了(因此最后查找它)。 This would look up docs to remove in an external source (redis, db...). 这将查找要从外部源(redis,db ...)中删除的文档。

In my opinion no matter how much the saw_me field grows, it will not make much difference in search time.Because tokens are indexed inversely and doc_values are created at index time in column major fashion for efficient read and has support for caching from OS. 我认为无论awk_me字段增长多少,搜索时间都不会有太大变化。因为令牌被反向索引,并且在索引时间以列主要方式创建doc_values,以实现高效读取并支持从OS缓存。
ES handles these things for you efficiently. ES可以为您高效地处理这些事情。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM