简体   繁体   中英

Elasticsearch : Disable IDF completely for search result scoring

This is my sample data in elasticsearch

{
    "_index": "12_index",
    "_type": "skill_strings",
    "_id": "AVKv-kM4axmY3fECZw9T",
    "_source": {
       "str": "PHP PHP PHP"
    }
 },
 {
    "_index": "12_index",
    "_type": "skill_strings",
    "_id": "AVKv-kNfaxmY3fECZw9U",
    "_source": {
       "str": "Javascript PHP Javascript Javascript"
    }
 }


"bool":{
  "must":[
    // some conditions
    {"match_phrase":{"str":"php"}}
  ],
  "should":[
    {"match_phrase":{"sentences":"Javascript"}}
  ]
}

norms is disable

in the result set, php (with 16 occurrences) gets a score of 13.65 (rounded off) whereas Javascript with the same number of occurrences in another doc gets a lower score of 9.58

As per my use case irrespective of how rare a word is or how short/long the field is, i want a same score for the same term frequency.

How can i do that ?

Here are two potential ways:

1) Custom similarity configuration. See the example here for how this is possible: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-similarity.html#scripted_similarity

2) Create a Scripting Engine:

https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-engine.html

In most cases, (1) should be easiest.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM