简体   繁体   中英

Elasticsearch Analyzers for text Analysis

i am new to Elasticsearch and willing to use for a full-text search engine. For Text analysis i need to work with (multilingual) language Analyzers. Elasticsearch offers built in language Analyzers but i am not sure if they cover preprocessing steps like: removing stop words, stemming, removing unwanted characters etc. I will be working with multiple-field, because all (descriptions) languages are indexed in the same fiel in a document. Is a mapping like this correct in this case?

{
"mappings": {
    "properties": {
        "description": {
            "type": "text",
            "analyzer": "english"
        },
        "description": {
            "type": "text",
            "analyzer": "german"
        },
        "description": {
            "type": "text",
            "analyzer": "french"
        }
    }
 }

i am confused how to use language analyzers to analyze the input-text and when do we use mappings instead of settings?

if you use the predefined language analyzers, they internally use corresponding language stop words, list of which mentioned here , and you can also define your own custom stop words with them.

For example, this Lucene code(Lucene is used by Elasticsearch internally) shows the stop words for english analyzer but if you want to add more worlds you can do that as well.

For stemming, As mentioned in stemmer official doc you can use the stemmer token filter and customize as well and it has mentioned about the languages as well.

Also, the analyzer goes through the three-phase process(char filter, tokenizer, and token filter) and inbuilt language analyzers have all these preconfigured, and if you want you can use your own things and customize them using the custom analyzer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM