简体   繁体   English

elasticsearch tf-idf并忽略搜索中的字段长度范数

[英]elasticsearch tf-idf and ignoring field length norm in search

I would like to perform searches in elasticsearch ignoring the field-norm in the tf-idf search. 我想在elasticsearch中执行搜索,忽略tf-idf搜索中的字段规范。 You can accomplish this by ignoring the field norms by setting the index mappings . 您可以通过设置索引映射忽略字段规范来实现此目的。 However it seems that this is accomplished by changes to the indexing, I just want to modify the search (I need the norms for other types of searches). 然而,似乎这是通过更改索引来实现的,我只想修改搜索(我需要其他类型搜索的规范)。 What is the best way to accomplish this? 完成此任务的最佳方法是什么? I'm using elasticsearch.js as my interface to elasticsearch. 我使用elasticsearch.js作为elasticsearch的接口。

You can't disable norms on a per-search basis, but you can use the Multi Fields API to add an additional field where the norms are disabled. 您不能基于每个搜索禁用规范,但您可以使用多字段API添加禁用规范的其他字段。

PUT /my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "my_field": {
          "type": "string",
          "fields": {
            "no_norms": { 
              "type":  "string",
              "norms": {
                "enabled": false
              }
            }
          }
        }
      }
    }
  }
}

Now you can search on my_field if you need norms and on my_field.no_norms if you don't. 现在,你可以搜索my_field如果你需要规范和my_field.no_norms如果你不这样做。 You have to reindex the data in order for the new field to be available for all documents, just adding it to the mapping won't change anything for exiting docs. 您必须重新索引数据,以便新字段可用于所有文档,只需将其添加到映射中,就不会更改退出文档的任何内容。

So this is the approach I ended up using. 所以这就是我最终使用的方法。 Instead of using tf-idf (current elasticsearch default) I used BM25 which is supposedly better. 而不是使用tf-idf(当前弹性搜索默认值)我使用了BM25,这应该更好。 Also, it has a parameter "b" that represents the importance of field length norm. 此外,它具有参数“b”,表示场长规范的重要性。 For "b=0" the field length norm is ignored while the default value is 0.75. 对于“b = 0”,忽略字段长度范数,而默认值为0.75。 A discussion of BM25 can be found here . 可在此处找到BM25的讨论。 Inside my elasticsearch.yml I have 我的elasticsearch.yml里面有

index :
  similarity:
    default:
      type: BM25
      b: 0.0
      k1: 1.2
    norm_bm25:
      type: BM25
      b: 0.75
      k1: 1.2

For those who use the elasticsearch javascript api, the custom similarity can then be defined during index creation 对于那些使用elasticsearch javascript api的人,可以在索引创建期间定义自定义相似性

client.indices.create({
  index: "db",
  body: { 
        settings: { 
          number_of_shards: 1,
          similarity : "norm_bm25"
        } 
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM