Elasticsearch - 如何通过聚合结果提高分数？

Question

My use case is as follows: Execute a search against Products and boost the score by its salesRank relative to the other documents in the results. 我的用例如下：对产品执行搜索，并通过其salesRank相对于结果中的其他文档提高分数。 The top 10% sellers should be boosted by a factor of 1.5 and the top 25-10% should be boosted by a factor of 1.25. 前10％的卖家应该提高1.5倍，而前25-10％的卖家应该提高1.25倍。 The percentiles are calculated on the results of the query, not the entire data set. 百分位数是根据查询结果计算的，而不是整个数据集。 This is feature is being used for on-the-fly instant results as the user types, so single character queries would still return results. 当用户键入时，此功能用于即时即时结果，因此单个字符查询仍将返回结果。

So for example, if I search for "Widget" and get back 100 results, the top 10 sellers returned will get boosted by 1.5 and the top 10-25 will get boosted by 1.25. 因此，例如，如果我搜索“Widget”并获得100个结果，那么返回的前10名卖家将获得1.5的提升，而前10-25名将获得1.25的提升。

I immediately thought of using the percentiles aggregation feature to calculate the 75th and 90th percentiles of the result set. 我立即想到使用百分位数聚合特征来计算结果集的第75和第90百分位数。

POST /catalog/product/_search?_source_include=name,salesRank
{
  "query": {
    "match_phrase_prefix": {
      "name": "N"
    }
  },
  "aggs": {
    "sales_rank_percentiles": {
      "percentiles": {
        "field" : "salesRank",
        "percents" : [75, 90]
      }
    }
  }
}

This gets me the following: 这让我得到以下信息：

{
   "hits": {
      "total": 142,
      "max_score": 1.6653868,
      "hits": [
         {
            "_score": 1.6653868,
            "_source": {
               "name": "nylon",
               "salesRank": 46
            }
         },
         {
            "_score": 1.6643861,
            "_source": {
               "name": "neon",
               "salesRank": 358
            }
         },
         ..... <SNIP> .....
      ]
   },
   "aggregations": {
      "sales_rank_percentiles": {
         "values": {
            "75.0": 83.25,
            "90.0": 304
         }
      }
   }
}

So great, that gives me the results and the percentiles. 太棒了，这给了我结果和百分位数。 But I would like to boost "neon" above "nylon" because it's a top 10% seller in the results (note: in our system, the salesRank value is descending in precedence, higher value = more sales). 但是我想在“尼龙”之上增加“霓虹灯”，因为它在结果中是前10％的卖家（注意：在我们的系统中，salesRank值优先下降，更高价值=更多销售）。 The text relevancy is very low since only one character was supplied, so sales rank should have a big effect. 由于只提供了一个字符，因此文本相关性非常低，因此销售排名应该会产生很大影响。

It seems that a function core query could be used here, but all of the examples in the documentation uses doc[] to use values from the document. 这里似乎可以使用函数核心查询，但文档中的所有示例都使用doc []来使用文档中的值。 There aren't any for using other information from the top-level of the response, eg "aggs" {}. 没有任何使用来自顶层响应的其他信息，例如“aggs”{}。 I would basically like to boost a document if its sales rank falls within the 100-90th and 89th-75th percentiles, by 1.5 and 1.25 respectively. 如果销售排名分别在第100-90和第89-75百分位数，分别为1.5和1.25，我基本上想要提升一份文件。

Is this something Elasticsearch supports or am I going to have to roll my own with a custom script or plugin? 这是Elasticsearch支持的东西，还是我将不得不使用自定义脚本或插件来推广自己的东西？ Or try a different approach entirely? 或者完全尝试不同的方法？ My preference would be to pre-calculate percentiles, index them, and do a constant score boost, but stakeholder prefers the run-time calculation. 我倾向于预先计算百分位数，对其进行索引，并持续得分，但利益相关者更喜欢运行时计算。

I'm using Elasticsearch 1.2.0. 我正在使用Elasticsearch 1.2.0。

Answer 1

What if you keep sellers as a parent document and periodically updates their stars (and some boosting factor), say, via some worker. 如果您将卖家作为父文件并定期更新他们的明星（以及一些提升因素），例如，通过一些工作人员，该怎么办？ Then you match products using has_parent query, and use a combination of score mode, custom score query to match top products from top sellers? 然后使用has_parent查询匹配产品，并使用得分模式，自定义得分查询的组合来匹配来自畅销书的顶级产品？

Elasticsearch - 如何通过聚合结果提高分数？

问题描述

1 个解决方案

解决方案1
1 2014-05-30 12:01:20

Elasticsearch - 如何通过聚合结果提高分数？

问题描述

1 个解决方案

解决方案1 1 2014-05-30 12:01:20

解决方案1
1 2014-05-30 12:01:20