How to control scoring or ordering of results while using ngram in Elasticsearch?

Question

I am using Elasticsearch 6.X. .

I have created an index test_index with index type doc as follow:

PUT /test_index
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "my_ngram_tokenizer"
        }
      },
      "tokenizer": {
        "my_ngram_tokenizer": {
          "type": "nGram",
          "min_gram": "1",
          "max_gram": "7",
          "token_chars": [
            "letter",
            "digit",
            "punctuation"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "my_text": {
          "type": "text",
          "fielddata": true,
          "fields": {
            "ngram": {
              "type": "text",
              "fielddata": true,
              "analyzer": "my_analyzer"
            }
          }
        }
      }
    }
  }
}

I have indexed data as follow:

PUT /text_index/doc/1
{
    "my_text": "ohio"
}
PUT /text_index/doc/2
{
    "my_text": "ohlin"
}
PUT /text_index/doc/3
{
    "my_text": "john"
}

Then I used search query:

{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "oh",
            "fields": [
              "my_text^5",
              "my_text.ngram"
            ]
          }
        }
      ]
    }
  }
}

And got the response:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1.0042334,
    "hits": [
      {
        "_index": "test_index",
        "_type": "doc",
        "_id": "1",
        "_score": 1.0042334,
        "_source": {
          "my_text": "ohio"
        }
      },
      {
        "_index": "test_index",
        "_type": "doc",
        "_id": "3",
        "_score": 0.97201055,
        "_source": {
          "my_text": "john"
        }
      },
      {
        "_index": "test_index",
        "_type": "doc",
        "_id": "2",
        "_score": 0.80404717,
        "_source": {
          "my_text": "ohlin"
        }
      }
    ]
  }
}

Here, we can see the when I searched for oh , I got results in the order:

-> ohio
-> john
-> ohlin

But, I want to have scoring and order of the results in a way which gives higher priority to matching prefix:

-> ohio
-> ohlin
-> john

How can I achieve such result ? What approaches can I take here ? Thanks in advance.

Answer 1

You should add a new subfield with a new analyzer using the edge_ngram tokenizer then add the new subfield in your multimatch.

You need then to use the type most_fields for your multimatch query. Then only the documents starting by the search term will match on this subfield and then will be boosted against others matching documents.

How to control scoring or ordering of results while using ngram in Elasticsearch?

Question

1 answers

solution1
0 2018-08-14 12:00:29

How to control scoring or ordering of results while using ngram in Elasticsearch?

Question

1 answers

solution1 0 2018-08-14 12:00:29

solution1
0 2018-08-14 12:00:29