如何在弹性搜索函数得分中提高字段长度范数？

Question

I know that elasticsearch takes in account the length of a field when computing the score of the documents retrieved by a query. 我知道，弹性搜索在计算查询检索到的文档的分数时会考虑字段的长度。 The shorter the field, the higher the weight (see The field-length norm ). 场越短，重量越大（参见场长标准）。

I like this behaviour: when I search for iphone I am much more interested in iphone 6 than in Crappy accessories for: iphone 5 iphone 5s iphone 6 . 我喜欢这种行为：当我搜索iphone我对iphone 6比对Crappy accessories for: iphone 5 iphone 5s iphone 6更感兴趣Crappy accessories for: iphone 5 iphone 5s iphone 6 。

Now, I would like to try to boost this stuff, let's say that I want to double its importance. 现在，我想尝试提升这些东西，让我们说我想要加倍它的重要性。

I know that one can modify the score using the function score , and I guess that I can achieve what I want via script score . 我知道可以使用功能分数修改分数，我想我可以通过脚本分数达到我想要的效果。

I tried to add another field-length norm to the score like this: 我试图在这个分数中添加另一个字段长度范数：

    {
     "query": {
       "function_score": {
         "boost_mode": "replace",
         "query": {...},
         "script_score": {
             "script": "_score + norm(doc)"
         }
       }
     }
   }

But I failed badly, getting this error: [No parser for element [function_score]] 但是我失败了，得到了这个错误： [No parser for element [function_score]]

EDIT: 编辑：

My first error was that I hadn't wrapped the function score in a "query". 我的第一个错误是我没有将功能分数包装在“查询”中。 Now I edited the code above. 现在我编辑了上面的代码。 My new error says 我的新错误说

GroovyScriptExecutionException[MissingMethodException
[No signature of method: Script5.norm() is applicable for argument types:
(org.elasticsearch.search.lookup.DocLookup) values: 
[<org.elasticsearch.search.lookup.DocLookup@2c935f6f>]
Possible solutions: notify(), wait(), run(), run(), dump(), any()]]

EDIT: I provided a first answer, but I'm hoping for a better one 编辑：我提供了第一个答案，但我希望有一个更好的答案

Answer 1

It looks like you could achieve that using a field of type token_count together with a field_value_factor function score . 看起来你可以使用token_count类型的字段和field_value_factor函数得分来实现。

So, something like this in the field mapping: 所以，在字段映射中这样的事情：

"name": { 
  "type": "string",
  "fields": {
    "length": { 
      "type":     "token_count",
      "analyzer": "standard"
    }
  }
}

This will use the number of tokens in the field. 这将使用该字段中的令牌数。 If you want to use the number of characters, you can change the analyzer from standard to a custom one that tokenizes each character. 如果要使用字符数，可以将分析器从standard更改为标记每个字符的自定义分析器。

Then in the query: 然后在查询中：

"function_score": {
  ...,
  "field_value_factor": {
    "field": "name.length",
    "modifier": "reciprocal"
  }
}

Answer 2

I have something that kind of works. 我有一些有用的东西。 With the following, I deduct the length of a field of my interest from the score. 通过以下内容，我从分数中扣除了我感兴趣的字段的长度。

{
 "query": {
   "function_score": {
     "boost_mode": "replace",
     "query": {...},
     "script_score": {
         "script": "_score  - doc['<field_name>'].value.length()"
     }
   }
 }
}

Hovever, I cannot control the relative weight of this number I am subtracting, compared to the old score. Hovever，我无法控制这个数字的相对权重我减去，与旧的分数相比。 That's why I am not accepting my answer: I'll wait for better ones for a while. 这就是为什么我不接受我的答案：我会暂时等待更好的答案。 Ideally, I'd love to have a way to access the field length norm function within the script_score , or to get an equivalent result. 理想情况下，我希望有一种方法可以访问script_score的字段长度范数函数，或者获得相同的结果。

如何在弹性搜索函数得分中提高字段长度范数？

问题描述

2 个解决方案

解决方案1
10 已采纳 2016-01-12 02:42:45

解决方案2
3 2015-08-17 23:15:41

如何在弹性搜索函数得分中提高字段长度范数？

问题描述

2 个解决方案

解决方案1 10 已采纳 2016-01-12 02:42:45

解决方案2 3 2015-08-17 23:15:41

解决方案1
10 已采纳 2016-01-12 02:42:45

解决方案2
3 2015-08-17 23:15:41