在Elasticsearch中使用ngram时如何控制结果的计分或排序？

Question

I am using Elasticsearch 6.X. 我正在使用Elasticsearch6.X。 . 。

I have created an index test_index with index type doc as follow: 我创建了索引为doc的索引test_index ，如下所示：

PUT /test_index
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "my_ngram_tokenizer"
        }
      },
      "tokenizer": {
        "my_ngram_tokenizer": {
          "type": "nGram",
          "min_gram": "1",
          "max_gram": "7",
          "token_chars": [
            "letter",
            "digit",
            "punctuation"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "my_text": {
          "type": "text",
          "fielddata": true,
          "fields": {
            "ngram": {
              "type": "text",
              "fielddata": true,
              "analyzer": "my_analyzer"
            }
          }
        }
      }
    }
  }
}

I have indexed data as follow: 我已索引数据如下：

PUT /text_index/doc/1
{
    "my_text": "ohio"
}
PUT /text_index/doc/2
{
    "my_text": "ohlin"
}
PUT /text_index/doc/3
{
    "my_text": "john"
}

Then I used search query: 然后，我使用搜索查询：

{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "oh",
            "fields": [
              "my_text^5",
              "my_text.ngram"
            ]
          }
        }
      ]
    }
  }
}

And got the response: 并得到了回应：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1.0042334,
    "hits": [
      {
        "_index": "test_index",
        "_type": "doc",
        "_id": "1",
        "_score": 1.0042334,
        "_source": {
          "my_text": "ohio"
        }
      },
      {
        "_index": "test_index",
        "_type": "doc",
        "_id": "3",
        "_score": 0.97201055,
        "_source": {
          "my_text": "john"
        }
      },
      {
        "_index": "test_index",
        "_type": "doc",
        "_id": "2",
        "_score": 0.80404717,
        "_source": {
          "my_text": "ohlin"
        }
      }
    ]
  }
}

Here, we can see the when I searched for oh , I got results in the order: 在这里，我们可以看到当我搜索oh ，我按顺序得到了结果：

-> ohio
-> john
-> ohlin

But, I want to have scoring and order of the results in a way which gives higher priority to matching prefix: 但是，我想以一种对匹配前缀给予更高优先级的方式对结果进行评分和排序：

-> ohio
-> ohlin
-> john

How can I achieve such result ? 我如何获得这样的结果？ What approaches can I take here ? 我在这里可以采取什么方法？ Thanks in advance. 提前致谢。

Answer 1

You should add a new subfield with a new analyzer using the edge_ngram tokenizer then add the new subfield in your multimatch. 您应该使用edge_ngram标记生成器使用新的分析器添加新的子字段，然后在多重匹配中添加新的子字段。

You need then to use the type most_fields for your multimatch query. 然后，您需要将类型most_fields用于多重匹配查询。 Then only the documents starting by the search term will match on this subfield and then will be boosted against others matching documents. 然后，仅搜索词开头的文档将在此子字段上匹配，然后针对其他匹配的文档进行增强。

在Elasticsearch中使用ngram时如何控制结果的计分或排序？

问题描述

1 个解决方案

解决方案1
0 2018-08-14 12:00:29

在Elasticsearch中使用ngram时如何控制结果的计分或排序？

问题描述

1 个解决方案

解决方案1 0 2018-08-14 12:00:29

解决方案1
0 2018-08-14 12:00:29