简体   繁体   中英

How to control scoring or ordering of results while using ngram in Elasticsearch?

I am using Elasticsearch 6.X. .

I have created an index test_index with index type doc as follow:

PUT /test_index
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "my_ngram_tokenizer"
        }
      },
      "tokenizer": {
        "my_ngram_tokenizer": {
          "type": "nGram",
          "min_gram": "1",
          "max_gram": "7",
          "token_chars": [
            "letter",
            "digit",
            "punctuation"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "my_text": {
          "type": "text",
          "fielddata": true,
          "fields": {
            "ngram": {
              "type": "text",
              "fielddata": true,
              "analyzer": "my_analyzer"
            }
          }
        }
      }
    }
  }
}

I have indexed data as follow:

PUT /text_index/doc/1
{
    "my_text": "ohio"
}
PUT /text_index/doc/2
{
    "my_text": "ohlin"
}
PUT /text_index/doc/3
{
    "my_text": "john"
}

Then I used search query:

{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "oh",
            "fields": [
              "my_text^5",
              "my_text.ngram"
            ]
          }
        }
      ]
    }
  }
}

And got the response:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1.0042334,
    "hits": [
      {
        "_index": "test_index",
        "_type": "doc",
        "_id": "1",
        "_score": 1.0042334,
        "_source": {
          "my_text": "ohio"
        }
      },
      {
        "_index": "test_index",
        "_type": "doc",
        "_id": "3",
        "_score": 0.97201055,
        "_source": {
          "my_text": "john"
        }
      },
      {
        "_index": "test_index",
        "_type": "doc",
        "_id": "2",
        "_score": 0.80404717,
        "_source": {
          "my_text": "ohlin"
        }
      }
    ]
  }
}

Here, we can see the when I searched for oh , I got results in the order:

-> ohio
-> john
-> ohlin

But, I want to have scoring and order of the results in a way which gives higher priority to matching prefix:

-> ohio
-> ohlin
-> john

How can I achieve such result ? What approaches can I take here ? Thanks in advance.

You should add a new subfield with a new analyzer using the edge_ngram tokenizer then add the new subfield in your multimatch.

You need then to use the type most_fields for your multimatch query. Then only the documents starting by the search term will match on this subfield and then will be boosted against others matching documents.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM