简体   繁体   中英

Elasticsearch: Sort the Documents on the index value of the search string in a text field

I have Elasticsearch data like this-

PUT /text/_doc/1
{
  "name": "pdf1",
  "text":"For the past six weeks. The unemployment crisis has unfolded so suddenly and rapidly."
}
PUT /text/_doc/2
{
  "name": "pdf2",
  "text":"The unemployment crisis has unfolded so suddenly and rapidly."
}

In this example I am making a full text search, I am searching for all the documents that have "unemployment" sub-string in the "text" field. And in the end i want all the documents sorted in the ascending order of the index value of "unemployment" string in the "text" field. For eg - the sub-string "unemployment" comes first in the doc2 at index "4" so i want this document to be returned first in the results.

GET /text/_search?pretty
{
  "query": {
    "match": {
      "text": "unemployment"
    }
  }
}

I have tried few things like term_vector, here is the mapping that i used but it didn't help.

PUT text/_mapping
{
    "properties": {
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword"
            }
          }
        },
        "text" : {
          "type" : "text",
          "term_vector": "with_positions_offsets"
        }
      }
}

Can anyone please help me in making the right mapping and search Query?

Thanks in Advance!

Try this query

GET text/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "text": "unemployment"
        }
      },
      "functions": [
        {
          "script_score": {
            "script": {
              "source": """
                def docval = doc['text.keyword'].value;
                def length = docval.length();
                def index = (float) docval.indexOf('unemployment');

                // the sooner the word appears the better so 'invert' the 'index'
                return index > -1 ? (1 / index) : 0;
              """
            }
          }
        }
      ],
      "boost_mode": "sum"
    }
  }
}

using the auto-generated mapping

{
  "text" : {
    "mappings" : {
      "properties" : {
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "text" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

Note that this is case-sensitive so it'd be reasonable to have a lowercase-normalized keyword field too and then access it in the script score script. This might get you on the right path.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM