简体   繁体   English

Elasticsearch - 如何通过聚合结果提高分数?

[英]Elasticsearch - How to boost score by the results of an aggregation?

My use case is as follows: Execute a search against Products and boost the score by its salesRank relative to the other documents in the results. 我的用例如下:对产品执行搜索,并通过其salesRank相对于结果中的其他文档提高分数。 The top 10% sellers should be boosted by a factor of 1.5 and the top 25-10% should be boosted by a factor of 1.25. 前10%的卖家应该提高1.5倍,而前25-10%的卖家应该提高1.25倍。 The percentiles are calculated on the results of the query, not the entire data set. 百分位数是根据查询结果计算的,而不是整个数据集。 This is feature is being used for on-the-fly instant results as the user types, so single character queries would still return results. 当用户键入时,此功能用于即时即时结果,因此单个字符查询仍将返回结果。

So for example, if I search for "Widget" and get back 100 results, the top 10 sellers returned will get boosted by 1.5 and the top 10-25 will get boosted by 1.25. 因此,例如,如果我搜索“Widget”并获得100个结果,那么返回的前10名卖家将获得1.5的提升,而前10-25名将获得1.25的提升。

I immediately thought of using the percentiles aggregation feature to calculate the 75th and 90th percentiles of the result set. 我立即想到使用百分位数聚合特征来计算结果集的第75和第90百分位数。

POST /catalog/product/_search?_source_include=name,salesRank
{
  "query": {
    "match_phrase_prefix": {
      "name": "N"
    }
  },
  "aggs": {
    "sales_rank_percentiles": {
      "percentiles": {
        "field" : "salesRank",
        "percents" : [75, 90]
      }
    }
  }
}

This gets me the following: 这让我得到以下信息:

{
   "hits": {
      "total": 142,
      "max_score": 1.6653868,
      "hits": [
         {
            "_score": 1.6653868,
            "_source": {
               "name": "nylon",
               "salesRank": 46
            }
         },
         {
            "_score": 1.6643861,
            "_source": {
               "name": "neon",
               "salesRank": 358
            }
         },
         ..... <SNIP> .....
      ]
   },
   "aggregations": {
      "sales_rank_percentiles": {
         "values": {
            "75.0": 83.25,
            "90.0": 304
         }
      }
   }
}

So great, that gives me the results and the percentiles. 太棒了,这给了我结果和百分位数。 But I would like to boost "neon" above "nylon" because it's a top 10% seller in the results (note: in our system, the salesRank value is descending in precedence, higher value = more sales). 但是我想在“尼龙”之上增加“霓虹灯”,因为它在结果中是前10%的卖家(注意:在我们的系统中,salesRank值优先下降,更高价值=更多销售)。 The text relevancy is very low since only one character was supplied, so sales rank should have a big effect. 由于只提供了一个字符,因此文本相关性非常低,因此销售排名应该会产生很大影响。

It seems that a function core query could be used here, but all of the examples in the documentation uses doc[] to use values from the document. 这里似乎可以使用函数核心查询 ,但文档中的所有示例都使用doc []来使用文档中的值。 There aren't any for using other information from the top-level of the response, eg "aggs" {}. 没有任何使用来自顶层响应的其他信息,例如“aggs”{}。 I would basically like to boost a document if its sales rank falls within the 100-90th and 89th-75th percentiles, by 1.5 and 1.25 respectively. 如果销售排名分别在第100-90和第89-75百分位数,分别为1.5和1.25,我基本上想要提升一份文件。

Is this something Elasticsearch supports or am I going to have to roll my own with a custom script or plugin? 这是Elasticsearch支持的东西,还是我将不得不使用自定义脚本或插件来推广自己的东西? Or try a different approach entirely? 或者完全尝试不同的方法? My preference would be to pre-calculate percentiles, index them, and do a constant score boost, but stakeholder prefers the run-time calculation. 我倾向于预先计算百分位数,对其进行索引,并持续得分,但利益相关者更喜欢运行时计算。

I'm using Elasticsearch 1.2.0. 我正在使用Elasticsearch 1.2.0。

What if you keep sellers as a parent document and periodically updates their stars (and some boosting factor), say, via some worker. 如果您将卖家作为父文件并定期更新他们的明星(以及一些提升因素),例如,通过一些工作人员,该怎么办? Then you match products using has_parent query, and use a combination of score mode, custom score query to match top products from top sellers? 然后使用has_parent查询匹配产品,并使用得分模式,自定义得分查询的组合来匹配来自畅销书的顶级产品?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM