简体   繁体   中英

Elastic search apply boost based on nested field value

Below is my indexed document

 {
  "defaultBoostValue":1.01,
  "boostDetails": [
    {
      "Type": "Type1",
      "value": 1.0001
    },
    {
      "Type": "Type2",
      "value": 1.002
    },
    {
      "Type": "Type3",
      "value": 1.0005
    }
  ]
}

i want to apply boost based on value passed, so suppose i pass Type 1 then boost applied will be 1.0001 and if that Type1 does not exist then it will use defaultBoostValue below is my query which works but quite slow, is there any way to optimize it further

Original question Above query works but is slow as we are using _source

   {
  "query": {
    "function_score": {
      "boost_mode": "multiply",
      "functions": [
          "script_score": {
            "script": {
              "source": """
                double findBoost(Map params_copy) {
                    for (def group : params_copy._source.boostDetails) {
                        if (group['Type'] == params_copy.preferredBoostType ) {
                            return group['value'];
                        }
                    }
                    return params_copy._source['defaultBoostValue'];
                }
                
                return findBoost(params)
              """,
              "params": {
                "preferredBoostType": "Type1"
              }
            }
          }
        }
      ]
    }
  }
}

I have removed the condition of not having dynamic mapping, if changing the structure of boostDetails mapping can help then I am ok but please explain how it can help and be faster to query also please give mapping types and modified structure if answer contains modifying mapping.

Using dynamic mappings (lots of fields)

It looks like you adjusted the doc structure compared to your original question .

The query above was thought for nested fields which cannot be easily iterated in a script for performance reasons. Having said that, the above is an even slower workaround which accesses the docs' _source and iterates its contents. But keep in mind that it's not recommended to access the _source in scripts!

If your docs aren't nested anymore, you can access the so-called doc values which are much more optimized for query-time access:

{
  "query": {
    "function_score": {
      ...
      "functions": [
        {
          ...
          "script_score": {
            "script": {
              "lang": "painless",
              "source": """
                try {
                  if (doc['boost.boostType.keyword'].value == params.preferredBoostType) {
                    return doc['boost.boostFactor'].value;
                  } else {
                    throw new Exception();
                  }
                } catch(Exception e) {
                  return doc['fallbackBoostFactor'].value;
                }
              """,
              "params": {
                "preferredBoostType": "Type1"
              }
            }
          }
        }
      ]
    }
  }
}

thus speeding up your function score query.


Alternative using an ordered list of values

Since the nested iteration is slow and dynamic mappings are blowing up your index, you could store your boosts in a standardized ordered list in each document:

"boostValues": [1.0001, 1.002, 1.0005, ..., 1.1]

and keep track of the corresponding boost types' order in the backend where you construct the queries:

var boostTypes = ["Type1", "Type2", "Type3", ..., "TypeN"]

So something like n-hot vectors .

Then, as you construct the Elasticsearch query, you'd look up the array index of the boostValues based on the boostType and pass this array index to the script query from above which'd access the corresponding boostValues doc-value.

This is guaranteed to be faster than _source access. But it's required that you always keep your boostTypes and boostValues in sync -- preferably append-only (as you add new boostTypes , the list grows in one dimension).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM