简体   繁体   中英

ElasticSearch Tokenizer keywords

I'm wondering how elastic search tokenizes keywords. Example: So I'm using a search box for searching keywords in comments. When I search for "Zelle" only comments in Spanish showed up. enter image description here But if I search for "Zell", all comments with "Zelle" showed up, with highlighting "Zell". enter image description here Can anyone please tell me why when I search for some keywords, only some comments in specific languages showed up?

Edit1: The mapping is like this:

 {
  "comments" : {
    "mappings" : {
      "ios" : {
        "properties" : {
          "content" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "country" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "date" : {
            "type" : "date"
          },
          "language" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "product_id" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "product_version" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "rating" : {
            "type" : "long"
          },
          "title" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "user_language" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  }
}

and it did not have any info with the tokenizer. How should I know which tokenizer es uses for searching?

I recommend you read the Mapping chapter of the official book, it will help you a lot.

To answer your question, we need to know the Mapping of your documents, specifically, the mapping of the field you search in.

By the look of it, you do not use the default analyzer (called " standard "), because "Zell" would not match "Zelle" with it.

In Elasticsearch you have analyzer which tokenize your content the way you want. And by the look of it, some analyzer is setup in your mapping, because "Zelle" and "Zell" are matching.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM