Elasticsearch query_string通配符不考虑长度

Question

I have some records on Elasticsearch that have the same first letters, such as: word, worda, wordab, wordabc, wordabcd. 我在Elasticsearch上有一些记录，它们的首字母相同，例如：word，worda，wordab，wordabc，wordabcd。

I am using query_string with a wildcard: 我使用带通配符的query_string：

"query": {
  "bool":{
    "must":[
      {
        "query_string":{
          "query":"word*"
        }
      }
    ]
  }
}

All hits have the same score ("_score" : 1.0), therefore the order is arbitrary. 所有匹配都具有相同的分数（“ _score”：1.0），因此顺序是任意的。 Is it possible to have a score considering how much the word actually matches the term? 是否有可能考虑到该词与该词实际匹配的分数？ For instance, word matches the term 100%, worda matches the term 80%, and so on. 例如，单词匹配术语100％，单词匹配术语80％，依此类推。

Answer 1

The reason why you get score 1 for all matched docs is the following - wildcard/prefix query are multi term queries and in order for them to be executed, Elasticsearch needs to do a rewrite (to get actual matched terms) 您为所有匹配的文档获得1分的原因如下-通配符/前缀查询是多词查询，为了执行它们，Elasticsearch需要进行重写（以获取实际的匹配词）

There are several ways to achieve this, the default one is called constant_score which assigned all constant scores (ones) 有多种方法可以实现此目的，默认方法称为constant_score ，它分配了所有恒定分数（一个）

There are several different ways to rewrite - some of them will produce non equal scores, but this scoring would be rather rely on TF-IDF distribution of the terms (eg how often worda is happening in the matched document and how many documents in whole index contains worda ). 有几种不同的重写方式 -其中一些会产生不相等的分数，但是这种评分将取决于术语的TF-IDF分布（例如，匹配文档中的单词出现频率以及整个索引中有多少文档包含worda ）。 As a first starting way you could try top_terms_1000 , tweaking it later. 作为第一种开始方式，您可以尝试top_terms_1000 ，然后进行调整。

Unfortunately, there is no perfect way out-of-the-box to achieve expected behaviour. 不幸的是，没有开箱即用的完美方法来实现预期的行为。

One of the possible ways to mimic it is to try adapt Edge NGram tokenizer to produce tokens from the wordabc as following: 模仿它的一种可能方法是尝试改编Edge NGram令牌生成器以从wordabc生成令牌，如下所示：

w, wo, wor, word, ...

In this case querying could produce more meaningful score. 在这种情况下，查询可以产生更有意义的分数。 For perfect expected outcome - percent of the match - you would need to create custom query and scoring mechanism 为了获得理想的预期结果（匹配百分比），您需要创建自定义查询和评分机制

Elasticsearch query_string通配符不考虑长度

问题描述

1 个解决方案

解决方案1
0 2019-01-18 11:18:05

Elasticsearch query_string通配符不考虑长度

问题描述

1 个解决方案

解决方案1 0 2019-01-18 11:18:05

解决方案1
0 2019-01-18 11:18:05