简体   繁体   English

在Elasticsearch中使用ngram时如何控制结果的计分或排序?

[英]How to control scoring or ordering of results while using ngram in Elasticsearch?

I am using Elasticsearch 6.X. 我正在使用Elasticsearch6.X。 .

I have created an index test_index with index type doc as follow: 我创建了索引为doc的索引test_index ,如下所示:

PUT /test_index
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "my_ngram_tokenizer"
        }
      },
      "tokenizer": {
        "my_ngram_tokenizer": {
          "type": "nGram",
          "min_gram": "1",
          "max_gram": "7",
          "token_chars": [
            "letter",
            "digit",
            "punctuation"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "my_text": {
          "type": "text",
          "fielddata": true,
          "fields": {
            "ngram": {
              "type": "text",
              "fielddata": true,
              "analyzer": "my_analyzer"
            }
          }
        }
      }
    }
  }
}

I have indexed data as follow: 我已索引数据如下:

PUT /text_index/doc/1
{
    "my_text": "ohio"
}
PUT /text_index/doc/2
{
    "my_text": "ohlin"
}
PUT /text_index/doc/3
{
    "my_text": "john"
}

Then I used search query: 然后,我使用搜索查询:

{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "oh",
            "fields": [
              "my_text^5",
              "my_text.ngram"
            ]
          }
        }
      ]
    }
  }
}

And got the response: 并得到了回应:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1.0042334,
    "hits": [
      {
        "_index": "test_index",
        "_type": "doc",
        "_id": "1",
        "_score": 1.0042334,
        "_source": {
          "my_text": "ohio"
        }
      },
      {
        "_index": "test_index",
        "_type": "doc",
        "_id": "3",
        "_score": 0.97201055,
        "_source": {
          "my_text": "john"
        }
      },
      {
        "_index": "test_index",
        "_type": "doc",
        "_id": "2",
        "_score": 0.80404717,
        "_source": {
          "my_text": "ohlin"
        }
      }
    ]
  }
}

Here, we can see the when I searched for oh , I got results in the order: 在这里,我们可以看到当我搜索oh ,我按顺序得到了结果:

-> ohio
-> john
-> ohlin

But, I want to have scoring and order of the results in a way which gives higher priority to matching prefix: 但是,我想以一种对匹配前缀给予更高优先级的方式对结果进行评分和排序:

-> ohio
-> ohlin
-> john

How can I achieve such result ? 我如何获得这样的结果? What approaches can I take here ? 我在这里可以采取什么方法? Thanks in advance. 提前致谢。

You should add a new subfield with a new analyzer using the edge_ngram tokenizer then add the new subfield in your multimatch. 您应该使用edge_ngram标记生成器使用新的分析器添加新的子字段,然后在多重匹配中添加新的子字段。

You need then to use the type most_fields for your multimatch query. 然后,您需要将类型most_fields用于多重匹配查询。 Then only the documents starting by the search term will match on this subfield and then will be boosted against others matching documents. 然后,仅搜索词开头的文档将在此子字段上匹配,然后针对其他匹配的文档进行增强。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM