简体   繁体   English

基于字符串在数字上的接近程度的弹性搜索评分文档

[英]Elastic search score documents based on how close numerically a string is

Assuming we have documents with the following format in elastic indexed:假设我们在弹性索引中有以下格式的文档:

{
  "street": "Adenauer Allee",
  "number": "119",
  "zipcode": "53113"
}

and we have a query like:我们有一个查询,如:

{
    "from": 0,
    "size": 1,
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "zipcode": {
                            "query": "53113",
                            "fuzziness": "0"
                        }
                    }
                },
                {
                    "match": {
                        "street": {
                            "query": "Adenauer Allee",
                            "fuzziness": "auto"
                        }
                    }
                }
            ],
            "should": [
                {
                    "match": {
                        "number": {
                            "query": "119"
                        }
                    }
                } 
            ]
        }
    }
}

Now let's say that our index contains 3 documents with现在假设我们的索引包含 3 个文档

street: "Adenauer Allee"
zipcode: "53113"

and they have different house numbers like:他们有不同的门牌号码,例如:

doc1: number: "11"
doc2: number: "120"
doc3: number: "10a"

(Notice the "a" in doc3). (注意 doc3 中的“a”)。

The above query will return as a result doc1 with number "11" (since it's closer alphanumerically).上面的查询将返回带有数字“11”的结果doc1 (因为它在字母数字上更接近)。

Desired behavior is to return first the document with the closest numerical value.期望的行为是首先返回具有最接近数值的文档。 In the above scenario this is doc2 with number "120".在上面的场景中,这是编号为“120”的doc2

How can I achieve that?我怎样才能做到这一点?

Elastic search info:弹性搜索信息:

{
"name": "193a315bccae",
"cluster_name": "demo",
"cluster_uuid": "kg3tZZOyqOgqTbn_elqs_g",
"version": {
"number": "7.5.1",
"build_flavor": "default",
"build_type": "docker",
"build_hash": "3ae9ac9a93c95bd0cdc054951cf95d88e1e18d96",
"build_date": "2019-12-16T22:57:37.835892Z",
"build_snapshot": false,
"lucene_version": "8.3.0",
"minimum_wire_compatibility_version": "6.8.0",
"minimum_index_compatibility_version": "6.0.0-beta1"
},
"tagline": "You Know, for Search"
}

The script_score -query allows you to implement your custom scoring logic (see Elasticsearch reference: Script Score Query ). script_score允许您实现自定义评分逻辑(请参阅 Elasticsearch 参考: Script Score Query )。 Rather than implementing your own script, you can also use one of the predefined decay functions for numeric fields, assuming that you "cleaned-up" the street numbers from characters (you can can convert number into a multi-field and store the numeric part of it separately, eg number.numeric )除了实现自己的脚本之外,您还可以对数字字段使用预定义的衰减函数之一,假设您从字符中“清理”了街道号码(您可以将number转换为多字段并存储数字部分)它分别,例如number.numeric

In previous versions of Elasticsearch you can use the function_score -query to implement the same logic (see Elasticsearch reference: Function Score Query ).在以前版本的 Elasticsearch 中,您可以使用function_score -query 来实现相同的逻辑(请参阅 Elasticsearch 参考: 函数分数查询)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM