简体   繁体   English

如何在Elasticsearch中匹配前缀

[英]How to match on prefix in Elasticsearch

let's say that in my elasticsearch index I have a field called "dots" which will contain a string of punctuation separated words (eg "first.second.third"). 让我们说在我的弹性搜索索引中我有一个名为“点”的字段,它将包含一串标点符号分隔的单词(例如“first.second.third”)。

I need to search for eg "first.second" and then get all entries whose "dots" field contains a string being exactly "first.second" or starting with "first.second.". 我需要搜索例如“first.second”,然后获取所有条目,其“dots”字段包含一个字符串正好是“first.second”或以“first.second”开头。

I have a problem understanding how the text querying works, at least I have not been able to create a query which does the job. 我在理解文本查询的工作原理时遇到了问题,至少我无法创建一个完成工作的查询。

Elasticsearch has Path Hierarchy Tokenizer that was created exactly for such use case. Elasticsearch具有完全针对此类用例创建的Path Hierarchy Tokenizer Here is an example of how to set it for your index: 以下是如何为索引设置它的示例:

# Create a new index with custom path_hierarchy analyzer 
# See http://www.elasticsearch.org/guide/reference/index-modules/analysis/pathhierarchy-tokenizer.html
curl -XPUT "localhost:9200/prefix-test" -d '{
    "settings": {
        "analysis": {
            "analyzer": {
                "prefix-test-analyzer": {
                    "type": "custom",
                    "tokenizer": "prefix-test-tokenizer"
                }
            },
            "tokenizer": {
                "prefix-test-tokenizer": {
                    "type": "path_hierarchy",
                    "delimiter": "."
                }
            }
        }
    },
    "mappings": {
        "doc": {
            "properties": {
                "dots": {
                    "type": "string",
                    "analyzer": "prefix-test-analyzer",
                    //"index_analyzer": "prefix-test-analyzer", //deprecated
                    "search_analyzer": "keyword"
                }
            }
        }
    }
}'
echo
# Put some test data
curl -XPUT "localhost:9200/prefix-test/doc/1" -d '{"dots": "first.second.third"}'
curl -XPUT "localhost:9200/prefix-test/doc/2" -d '{"dots": "first.second.foo-bar"}'
curl -XPUT "localhost:9200/prefix-test/doc/3" -d '{"dots": "first.baz.something"}'
curl -XPOST "localhost:9200/prefix-test/_refresh"
echo
# Test searches. 
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true" -d '{
    "query": {
        "term": {
            "dots": "first"
        }
    }
}'
echo
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true" -d '{
    "query": {
        "term": {
            "dots": "first.second"
        }
    }
}'
echo
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true" -d '{
    "query": {
        "term": {
            "dots": "first.second.foo-bar"
        }
    }
}'
echo
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true&q=dots:first.second"
echo

Have a look at prefix queries . 看看前缀查询

$ curl -XGET 'http://localhost:9200/index/type/_search' -d '{
    "query" : {
        "prefix" : { "dots" : "first.second" }
    }
}'

There is also a much easier way, as pointed out in elasticsearch documentation : 正如elasticsearch 文档中指出的那样,还有一种更简单的方法:

just use: 只需使用:

{
    "text_phrase_prefix" : {
        "fieldname" : "yourprefix"
    }
}

or since 0.19.9: 或者从0.19.9开始:

{
    "match_phrase_prefix" : {
        "fieldname" : "yourprefix"
    }
}

instead of: 代替:

{   
    "prefix" : { 
        "fieldname" : "yourprefix" 
}

You should use a commodin chars to make your query, something like this: 您应该使用商品字符来进行查询,如下所示:

$ curl -XGET http://localhost:9200/myapp/index -d '{
    "dots": "first.second*"
}'

more examples about the syntax at: http://lucene.apache.org/core/old_versioned_docs/versions/2_9_1/queryparsersyntax.html 有关语法的更多示例: http//lucene.apache.org/core/old_versioned_docs/versions/2_9_1/queryparsersyntax.html

I was looking for a similar solution - but matching only a prefix. 我一直在寻找类似的解决方案 - 但只匹配一个前缀。 I found @imtov's answer to get me almost there, but for one change - switching the analyzers around: 我找到@ imtov的答案让我几乎到了那里,但是为了一个改变 - 切换分析器:

"mappings": {
    "doc": {
        "properties": {
            "dots": {
                "type": "string",
                "analyzer": "keyword",
                "search_analyzer": "prefix-test-analyzer"
            }
        }
    }
}

instead of 代替

"mappings": {
    "doc": {
        "properties": {
            "dots": {
                "type": "string",
                "index_analyzer": "prefix-test-analyzer",
                "search_analyzer": "keyword"
            }
        }
    }
}

This way adding: 这样添加:

'{"dots": "first.second"}'
'{"dots": "first.third"}'

Will add only these full tokens, without storing first , second , third tokens. 将仅添加这些完整的令牌,而不存储firstsecondthird令牌。

Yet searching for either 然而,寻找任何一个

first.second.anyotherstring
first.second

will correctly return only the first entry: 将正确返回第一个条目:

'{"dots": "first.second"}'

Not exactly what you asked for but somehow related, so I thought could help someone. 不完全是你要求的,但不知何故相关,所以我认为可以帮助别人。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM