简体   繁体   English

在Elastic Search中忽略TF-IDF

[英]Ignoring TF-IDF in Elastic Search

I have a use case of resume screening candidates based on job description keywords. 我有一个基于职位描述关键字的简历筛选候选人的用例。 Since I cannot afford change in score each time a new candidate profile is added to the content list (I assume IDF will change), I want to omit TF_IDF. 由于每次将新的候选人资料添加到内容列表时我都无法负担分数的变化(我认为IDF会发生变化),因此我想省略TF_IDF。

The indexed document is 索引文件是

{
                "_index": "crawler_profiles",
                "_type": "_doc",
                "_id": "81ebeb3ff52d90a488b7bce752a4a0cf",
                "_score": 1,
                "_source": {
                    "content": "Peachtree MBA"
                    }
}

As per the documentation here, I created following query 根据这里的文档 ,我创建了以下查询

 {
  "query": {
    "bool": {
      "should": [
        { "constant_score": {
          "query": { "match": { "content": "corporate strategy" }}
        }},
        { "constant_score": {
          "query": { "match": { "content": "strategy consulting" }}
        }},
        { "constant_score": {
          "query": { "match": { "content": "international strategy" }}
        }},
        { "constant_score": {
          "query": { "match": { "content": "MBA" }}
        }}
      ]
    }
  }
}

I am getting following error 我收到以下错误

[constant_score] query does not support [query]

All I want is to score 1 for 1-or-n existence of a term and 0 if does not exist(eventually skip tf-idf). 我只想为1或n个词的存在评分1,如果不存在则给0评分(最终跳过tf-idf)。 Any help is appreciated. 任何帮助表示赞赏。

ES version: 6.4.2 ES版本:6.4.2

The documentation that you have linked is for ES version 2.x. 您链接的文档适用于ES 2.x版。 In 6.4.x there are some changes as shown here: https://www.elastic.co/guide/en/elasticsearch/reference/6.4/query-dsl-constant-score-query.html 在6.4.x中,有一些更改,如下所示: https ://www.elastic.co/guide/en/elasticsearch/reference/6.4/query-dsl-constant-score-query.html

You should be able to update your query to something like this: 您应该能够将查询更新为以下内容:

EDIT: Updated the "term" filters to use "match" . 编辑: 更新了"term"过滤器以使用"match"

{
  "query": {
    "bool": {
      "should": [
        { "constant_score": {
          "filter": { "match": { "description": "corporate strategy" }}
        }},
        { "constant_score": {
          "filter": { "match": { "description": "strategy consulting" }}
        }},
        { "constant_score": {
          "filter": { "match": { "description": "international strategy" }}
        }},
        { "constant_score": {
          "filter": { "match": { "description": "MBA" }}
        }}
      ]
    }
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM