用于在Elastic搜索中分割字符串的映射分析器

Question

is it possible to create a mapping analyser for splitting string into smaller parts based on count of characters? 是否可以创建一个映射分析器，用于根据字符数将字符串分成较小的部分？

For example, let's say I have a string: "ABCD1E2F34". 例如，假设我有一个字符串：“ ABCD1E2F34”。 This is some token constructed from multiple smaller codes and I want to break it down to those codes again. 这是由多个较小的代码构成的令牌，我想再次将其分解为这些代码。

If I know for sure that: - First code is always 4 letters ("ABCD") - Second is 3 letters ("1E2") - Third is 1 letter ("F") - Fourth is 2 letters ("34") 如果我确定知道：-第一个代码始终为4个字母（“ ABCD”）-第二个代码为3个字母（“ 1E2”）-第三个为1个字母（“ F”）-第四个为2个字母（“ 34”）

Can I create a mapping analyser for a field that will map the string like this? 我可以为将这样映射字符串的字段创建映射分析器吗？ If I set the field "bigCode" to have value "ABCD1E2F34" I will be able to access it like this: 如果我将字段“ bigCode”设置为值“ ABCD1E2F34”，则可以这样访问它：

bigCode.full ("ABCD1E2F34")
bigCode.first ("ABCD")
bigCode.second ("1E2")
...

Thanks a lot! 非常感谢！

Answer 1

What do you think about Pattern tokenizer? 您如何看待模式令牌生成器？ I create a regex to split string to tokens which is (?<=(^\\\\w{4}))|(?<=^\\\\w{4}(\\\\w{3}))|(?<=^\\\\w{4}\\\\w{3}(\\\\w{1}))|(?<=^\\\\w{4}\\\\w{3}\\\\w{1}(\\\\w{2})) . 我创建了一个正则表达式，将字符串拆分为(?<=(^\\\\w{4}))|(?<=^\\\\w{4}(\\\\w{3}))|(?<=^\\\\w{4}\\\\w{3}(\\\\w{1}))|(?<=^\\\\w{4}\\\\w{3}\\\\w{1}(\\\\w{2})) 。 After that I created an analyzer like that: 之后，我创建了一个类似的分析器：

PUT /myindex
{
    "settings": {
        "analysis": {
          "analyzer": {
            "codeanalyzer": {
              "type": "pattern",
              "pattern":"(?<=(^\\w{4}))|(?<=^\\w{4}(\\w{3}))|(?<=^\\w{4}\\w{3}(\\w{1}))|(?<=^\\w{4}\\w{3}\\w{1}(\\w{2}))"
            }
          }
        }
    }
}

POST /myindex/_analyze?analyzer=codeanalyzer&text=ABCD1E2F34

And the result is tokenized data: 结果是标记化数据：

{
  "tokens": [
    {
      "token": "abcd",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 0
    },
    {
      "token": "1e2",
      "start_offset": 4,
      "end_offset": 7,
      "type": "word",
      "position": 1
    },
    {
      "token": "f",
      "start_offset": 7,
      "end_offset": 8,
      "type": "word",
      "position": 2
    },
    {
      "token": "34",
      "start_offset": 8,
      "end_offset": 10,
      "type": "word",
      "position": 3
    }
  ]
}

You can check the documentation also : https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html 您也可以查看文档： https : //www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html

用于在Elastic搜索中分割字符串的映射分析器

问题描述

1 个解决方案

解决方案1
0 2016-09-01 11:13:54

用于在Elastic搜索中分割字符串的映射分析器

问题描述

1 个解决方案

解决方案1 0 2016-09-01 11:13:54

解决方案1
0 2016-09-01 11:13:54