Elasticsearch，如何将词接ngram呢？

Question

I'd like to concatenate words then ngram it. 我想将单词连接起来，然后再对其进行语法处理。
What's the correct setting for elasticsearch? 弹性搜索的正确setting是什么？

In english, 用英语，

from: stack overflow 来自： stack overflow

==> stackoverflow : concatenate first, ==> stackoverflow ：首先连接，

==> sta / tac / ack / cko / kov / ... and etc (min_gram: 3, max_gram: 10) ==> sta / tac / ack / cko / kov / ...等（min_gram：3，max_gram：10）

Answer 1

To do the concatenation I'm assuming that you just want to remove all spaces from your input data. 为了进行串联，我假设您只想从输入数据中删除所有空格。 To do this, you need to implement a pattern_replace char filter that replaces space with nothing. 为此，您需要实现一个pattern_replace char过滤器，该过滤器可以用任何内容代替空格。

Setting up the ngram tokenizer should be easy - just specify your token min/max lengths. 设置ngram令牌生成器应该很容易-只需指定令牌的最小/最大长度即可。

It's worth adding a lowercase token filter too - to make searching case insensitive. 值得添加一个小写的令牌过滤器 -使搜索不区分大小写。

curl -XPOST localhost:9200/my_index -d '{
  "index": {
    "analysis": {
        "analyzer": {
            "my_new_analyzer": {
                "filter": [
                    "lowercase"
                ],
                "tokenizer": "my_ngram_tokenizer",
                "char_filter" : ["my_pattern"],
                "type": "custom"
            }
        },
       "char_filter" : {
          "my_pattern":{
            "type":"pattern_replace",
            "pattern":"\u0020",
            "replacement":""
           }
        }, 
        "tokenizer" : {
                "my_ngram_tokenizer" : {
                    "type" : "nGram",
                    "min_gram" : "3",
                    "max_gram" : "10",
                    "token_chars": ["letter", "digit", "punctuation", "symbol"]
                }
            }
    }
  }
}'

testing this: 测试此：

curl -XGET 'localhost:9200/my_index/_analyze?analyzer=my_new_analyzer&pretty' -d 'stack overflow'

gives the following (just a small part shown below): 给出以下内容（如下所示只是一小部分）：

{
"tokens" : [ {
  "token" : "sta",
  "start_offset" : 0,
  "end_offset" : 3,
  "type" : "word",
  "position" : 1
}, {
  "token" : "stac",
  "start_offset" : 0,
  "end_offset" : 4,
  "type" : "word",
  "position" : 2
}, {
  "token" : "stack",
  "start_offset" : 0,
  "end_offset" : 6,
  "type" : "word",
  "position" : 3
}, {
  "token" : "stacko",
  "start_offset" : 0,
  "end_offset" : 7,
  "type" : "word",
  "position" : 4
}, {
  "token" : "stackov",
  "start_offset" : 0,
  "end_offset" : 8,
  "type" : "word",
  "position" : 5
}, {

Elasticsearch，如何将词接ngram呢？

问题描述

1 个解决方案

解决方案1
2 2014-12-01 14:57:14

Elasticsearch，如何将词接ngram呢？

问题描述

1 个解决方案

解决方案1 2 2014-12-01 14:57:14

解决方案1
2 2014-12-01 14:57:14