简体   繁体   中英

ElasticSearch with hunspell analyzer

I'd like to create an index in ElasticSearch which stores a specific type of data which has some string fields. The language is hungarian.

I ran a HTTP PUT command with the following body:

{
    "settings" : {  
        "analysis" : {
            "analyzer" : {
                "hu" : {
                    "tokenizer" : "standard",
                    "filter" : [ "lowercase", "hu_HU" ]         
                }
            },
            "filter" : {
                "hu_HU" : {
                    "type" : "hunspell",
                    "locale" : "hu_HU",
                    "language" : "hu_HU"
                }
            }       
        }
    },
    "mappings": {
        "printedArticle": {
            "_source": {"enabled": false},
            "properties": {
                "_id": {"type": "string", "store": true},
                "mysqlid": {"type": "long", "store": false},
                "publishDate": {"type": "date", "format": "dateOptionalTime", "store": false},
                "title": {"type": "string", "analyzer": "hu", "analyze": true, "store": false},
                "lead": {"type": "string", "analyzer": "hu", "analyze": true, "store": false},
                "content": {"type": "string", "analyzer": "hu", "analyze": true, "store": false},
                "participants": {"type": "string", "analyzer": "hu", "analyze": true, "store": false},
                "authors": {"type": "string", "analyzer": "hu", "analyze": true, "store": false},
                "subtitle": {"type": "string", "analyzer": "hu", "analyze": true, "store": false}
            }
        }
    }   
}

Then I inserted one record with some test text, and if I run a search through Elastic API with a GET request like this:

http://localhost:9200/mf_pa/_search?q=MYTESTTEXT

it founds my record only if my test text is equal with one of the words of my record.

I tried to analyze some similar text through the analysis API:

http://localhost:9200/mf_pa/_analyze?analyzer=hu&text=My text to tokenize

and it tokenized my test text properly. Based on this fact I'd expect that if I put a previously found token into my search query, it would find the record but it's not.

For an english example I'd say that my text is 'unforgettable' and my query is 'forget'. What should I do to find the record?

If the analyzer tests out using the Analyze API, it should also work in the mapping. Here are some things to check:

  1. Make sure the mapping was input successfully. GET /mf_pa/_mapping

    For example, "analyze": true should be "index": "analyzed"

  2. Make sure that the test document was actually correctly indexed as type printedArticle .

    GET /mf_pa/_search should return your test doc showing "_type": "printedArticle" .

  3. You can also use the Analyze API to validate how text will analyze against a specific field (to ensure the analyzer is correctly applied to that field)

    eg GET /mf_pa/_analyze/?field=title&text=A kőszivű ember fiai

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM