ElasticSearch with hunspell analyzer

Question

I'd like to create an index in ElasticSearch which stores a specific type of data which has some string fields. The language is hungarian.

I ran a HTTP PUT command with the following body:

{
    "settings" : {  
        "analysis" : {
            "analyzer" : {
                "hu" : {
                    "tokenizer" : "standard",
                    "filter" : [ "lowercase", "hu_HU" ]         
                }
            },
            "filter" : {
                "hu_HU" : {
                    "type" : "hunspell",
                    "locale" : "hu_HU",
                    "language" : "hu_HU"
                }
            }       
        }
    },
    "mappings": {
        "printedArticle": {
            "_source": {"enabled": false},
            "properties": {
                "_id": {"type": "string", "store": true},
                "mysqlid": {"type": "long", "store": false},
                "publishDate": {"type": "date", "format": "dateOptionalTime", "store": false},
                "title": {"type": "string", "analyzer": "hu", "analyze": true, "store": false},
                "lead": {"type": "string", "analyzer": "hu", "analyze": true, "store": false},
                "content": {"type": "string", "analyzer": "hu", "analyze": true, "store": false},
                "participants": {"type": "string", "analyzer": "hu", "analyze": true, "store": false},
                "authors": {"type": "string", "analyzer": "hu", "analyze": true, "store": false},
                "subtitle": {"type": "string", "analyzer": "hu", "analyze": true, "store": false}
            }
        }
    }   
}

Then I inserted one record with some test text, and if I run a search through Elastic API with a GET request like this:

http://localhost:9200/mf_pa/_search?q=MYTESTTEXT

it founds my record only if my test text is equal with one of the words of my record.

I tried to analyze some similar text through the analysis API:

http://localhost:9200/mf_pa/_analyze?analyzer=hu&text=My text to tokenize

and it tokenized my test text properly. Based on this fact I'd expect that if I put a previously found token into my search query, it would find the record but it's not.

For an english example I'd say that my text is 'unforgettable' and my query is 'forget'. What should I do to find the record?

Answer 1

If the analyzer tests out using the Analyze API, it should also work in the mapping. Here are some things to check:

Make sure the mapping was input successfully. GET /mf_pa/_mapping
For example, "analyze": true should be "index": "analyzed"
Make sure that the test document was actually correctly indexed as type printedArticle .
GET /mf_pa/_search should return your test doc showing "_type": "printedArticle" .
You can also use the Analyze API to validate how text will analyze against a specific field (to ensure the analyzer is correctly applied to that field)
eg GET /mf_pa/_analyze/?field=title&text=A kőszivű ember fiai

ElasticSearch with hunspell analyzer

Question

1 answers

solution1
0 ACCPTED 2015-09-23 15:48:19

ElasticSearch with hunspell analyzer

Question

1 answers

solution1 0 ACCPTED 2015-09-23 15:48:19

solution1
0 ACCPTED 2015-09-23 15:48:19