简体   繁体   中英

ElasticSearch index score

I noticed that the more objects are in the table, the greater the relevance when searching by keywords. So for example there are 2 entities, services and news, the services have the head "tick removal", the news has the heading "hand removal", I have a total of 1000 services and 50 news, and if I search for the word “removal” then the relevance of the service is 1200 and the news of 200 , how can I set up ElasticSearchBundle so that the number of elements does not play a role during intexation?

Looks to me that you do not want to take into consideration the relevancy calculation and that you probably want to disable tf-idf all together.

TF-IDF takes into consideration the count of occurrences of the words.

Take a look at Constant Score Query which might be what you are looking for and that you can make use of Filter Queries in order to not take into consideration the relevancy calculation

Below is how your query could be constructed using them both:

POST <your_index_name>/_search
{ 
   "query":{ 
      "constant_score":{ 
         "filter":{ 
            "query_string":{ 
               "query":"removal"
            }
         },
         "boost":1.2
      }
   }
}

Note that when you execute the above query, all the documents would have a constant score of 1.2

Note that if you are not bothered for score at all, best to use simple Filter Queries where it would simply act as boolean query.

This link mentions that:

Filter queries do not calculate relevance scores. To speed up performance, Elasticsearch automatically caches frequently used filter queries.

See you also have added advantage of performance here.

Let me know if this helps.

Perhaps try "boolean similarity" instead of tf/idf, there is a nice article about it here: https://saskia-vola.com/when-simple-is-better-the-boolean-similarity-module

The scoring function for the boolean model is much simpler than tf/idf. Either a term appears in a document or not. So you have 2 possible scores: 1 and 0 for every term. If three of your terms appear in a document, this document will have a score 3, that is much simpler to work within some cases.

You can implement it just by adding "similarity": "boolean" to your text fields:

PUT test
{
  "mappings": {
    "doc" : {
      "properties" : {
        "content" : {
          "type" : "text",
          "similarity" : "boolean"
        }
      }
    }
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM