简体   繁体   中英

Behavior of elasticsearch-analysis-kuromoji is not what i expected

I have been using elasticsearch-analysis-kuromoji to perform searches in Japanese, but I have been getting two very strange behaviours, the first one being that the characters I search for will not work, like - '輸出貿易' will not work unless I pass it as '輸 出 貿 易' with spaces between each character. Also characters like ント are not searched on.

This is my configuration:

            .setSettings(ImmutableSettings.settingsBuilder().loadFromSource(jsonBuilder()
                    .startObject()
                    .startObject("analysis")
                            //
                    .startObject("tokenizer")
                    .startObject("kuromoji_user_dict")
                    .field("type", "kuromoji_tokenizer")
                    .field("mode", "extended")
                    .field("discard_punctuation", "false")
                    .endObject()
                    .endObject()
                            //
                    .startObject("analyzer")
                    .startObject(JAPANESE_LANGUAGE_ANALYSIS)
                    .field("type", "custom")
                    .field("tokenizer", "kuromoji_user_dict")
                    .endObject()
                    .endObject()
                            //

                    .endObject()
                    .endObject().string()));

Am I configuring it wrong or do I need a different tokeniser for character like: '輸出貿易 and ント'

Thank You

After some online research and some help from the elasticsearch-analysis-kuromoji team I was able to find the problem, even though I created the analyst and told the query to use it, I also need to add the mapping like so:

XContentBuilder xbMapping =
        jsonBuilder()
                .startObject()
                .startObject(indexType)
                .startObject("properties")
                .startObject("source")
                .field("type", "string")
                .endObject()
                .startObject("text")
                .field("type", "string")
                .field("analyzer", JAPANESE_LANGUAGE_ANALYSIS)
                .endObject()
                .endObject()
                .endObject()
                .endObject();

elasticSearchClient.admin().indices()
        .preparePutMapping(indexName)
        .setType(indexType)
        .setSource(xbMapping)
        .execute().get();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM