简体   繁体   中英

PHP Elastic Search Full Text Search - Sort by Relevance

I want to fetch "User" data using the "%LIKE%" condition in Elastic Search.

GET user/_search
{
    "query": {
        "query_string": {
            "fields": ["firstname", "lastname"],
            "query": "*a*"
        }
    },
    "sort": {
        "_score": "desc"
    }
}

It returns the results with "_score": 1 for all the data.

The data with name "Kunal Dethe" is first and "Abhijit Pingale" is second.

But as expected "Abhijit Pingale" should come first because, the letter "a" occurs twice in this name and not in "Kunal Dethe".

Any ideas why?

Used the "nGram" solution but for a text like "ab", the grams are broken down as "a", "b" then "ab" as the "min_gram" is set to 1 because the result should be returned even when a single character is entered. 使用了“ nGram”解决方案,但是对于诸如“ ab”之类的文本,克被分解为“ a”,“ b”,然后将“ ab”作为“ min_gram”设置为1,因为即使返回结果也是如此输入单个字符时。

But I want the search to be done as "ab" only.

Of course, can increase the "min_gram" but can it be dynamically set to the length of the text searched?

POST /user
{
    "settings": {
        "analysis": {
            "filter": {
                "substring": {
                    "type": "nGram",
                    "min_gram": 1,
                    "max_gram": 15
                }
            },
            "analyzer": {
                "substring_analyzer": {
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "substring"
                    ]
                }
            }
        }
    },
    "mappings": {
        "user": {
            "properties": {
                "id": {
                    "type": "long"
                },
                "firstname": {
                    "type": "string",
                    "analyzer": "substring_analyzer"
                },
                "lastname": {
                    "type": "string",
                    "analyzer": "substring_analyzer"
                }
            }
        }
    }
}

//Searching via

GET user/_search
{
    "query": {
        "query_string": {
            "fields": ["firstname^2", "lastname"],
            "query": "ab"
        }
    }
}

One way of achieving what you want is to specify an analyzer to use (ie standard ) at search time so your input doesn't get analyzed by the default ngram analyzer. That way you'll only match ab tokens and neither a nor b tokens.

GET user/_search
{
    "query": {
        "query_string": {
            "fields": ["firstname^2", "lastname"],
            "query": "ab",
            "analyzer": "standard"     <--- add this
        }
    }
}

A better approach, however, is to set "search_analyzer": "standard" in your mapping instead of using the ngram approach at search time as well, which is the case when only specifying "analyzer": "substring_analyzer" . So if you search for ab you'll only match ab tokens as that will not be ngram'ed at search time.

"mappings": {
    "user": {
        "properties": {
            "id": {
                "type": "long"
            },
            "firstname": {
                "type": "string",
                "analyzer": "substring_analyzer",
                "search_analyzer": "standard"       <-- add this
            },
            "lastname": {
                "type": "string",
                "analyzer": "substring_analyzer",
                "search_analyzer": "standard"       <-- add this
            }
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM