简体   繁体   中英

Elasticsearch exact match field

I have a field called url that is set to not_analyzed when I index it:

'url' => [
    'type' => 'string',
    'index' => 'not_analyzed'
]

Here is my method to determine if a URL already exists in the index:

public function urlExists($index, $type, $url) {
    $params = [
        'index' => $index,
        'type' => $type,
        'body' => [
            'query' => [
                'match' => [
                    'url' => $url
                ]
            ]
        ]
    ];

    $results = $this->client->count($params);

    return ($results['count'] > 0);
}

This seems to work fine however I can't be 100% sure this is the correct way to find an exact match, as reading the docs another way to do the search is with the params like:

    $params = [
        'index' => $index,
        'type' => $type,
        'body' => [
            'query' => [
                'filtered' => [
                    'filter' => [
                        'term' => [
                            'url' => $url
                        ]
                    ]
                ]
            ]
        ]
    ];

My question is would either params work the same way for a not_analyzed field?

The second query is the right approach. term level queries/filters should be used for exact match. Biggest advantage is caching . Elasticsearch uses bitset for this and you will get quicker response time with subsequent calls.

From the Docs

Exclude as many document as you can with a filter, then query just the documents that remain.

Also if you observe your output, you will find that _score of every document is 1 as scoring is not applied to filters, same goes for highlighting but with match query you will see different _score . Again From the Docs

Keep in mind that once you wrap a query as a filter, it loses query features like highlighting and scoring because these are not features supported by filters.

Your first query uses match which is basically used for analyzed fields eg when you want both Google and google to match all your documents containing google(case insensitive) match queries are used.

Hope this helps!!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM