[英]Mapping analyser for splitting string in Elastic search
is it possible to create a mapping analyser for splitting string into smaller parts based on count of characters? 是否可以创建一个映射分析器,用于根据字符数将字符串分成较小的部分?
For example, let's say I have a string: "ABCD1E2F34". 例如,假设我有一个字符串:“ ABCD1E2F34”。 This is some token constructed from multiple smaller codes and I want to break it down to those codes again. 这是由多个较小的代码构成的令牌,我想再次将其分解为这些代码。
If I know for sure that: - First code is always 4 letters ("ABCD") - Second is 3 letters ("1E2") - Third is 1 letter ("F") - Fourth is 2 letters ("34") 如果我确定知道:-第一个代码始终为4个字母(“ ABCD”)-第二个代码为3个字母(“ 1E2”)-第三个为1个字母(“ F”)-第四个为2个字母(“ 34”)
Can I create a mapping analyser for a field that will map the string like this? 我可以为将这样映射字符串的字段创建映射分析器吗? If I set the field "bigCode" to have value "ABCD1E2F34" I will be able to access it like this: 如果我将字段“ bigCode”设置为值“ ABCD1E2F34”,则可以这样访问它:
bigCode.full ("ABCD1E2F34")
bigCode.first ("ABCD")
bigCode.second ("1E2")
...
Thanks a lot! 非常感谢!
What do you think about Pattern tokenizer? 您如何看待模式令牌生成器? I create a regex to split string to tokens which is (?<=(^\\\\w{4}))|(?<=^\\\\w{4}(\\\\w{3}))|(?<=^\\\\w{4}\\\\w{3}(\\\\w{1}))|(?<=^\\\\w{4}\\\\w{3}\\\\w{1}(\\\\w{2}))
. 我创建了一个正则表达式,将字符串拆分为(?<=(^\\\\w{4}))|(?<=^\\\\w{4}(\\\\w{3}))|(?<=^\\\\w{4}\\\\w{3}(\\\\w{1}))|(?<=^\\\\w{4}\\\\w{3}\\\\w{1}(\\\\w{2}))
。 After that I created an analyzer like that: 之后,我创建了一个类似的分析器:
PUT /myindex
{
"settings": {
"analysis": {
"analyzer": {
"codeanalyzer": {
"type": "pattern",
"pattern":"(?<=(^\\w{4}))|(?<=^\\w{4}(\\w{3}))|(?<=^\\w{4}\\w{3}(\\w{1}))|(?<=^\\w{4}\\w{3}\\w{1}(\\w{2}))"
}
}
}
}
}
POST /myindex/_analyze?analyzer=codeanalyzer&text=ABCD1E2F34
And the result is tokenized data: 结果是标记化数据:
{
"tokens": [
{
"token": "abcd",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 0
},
{
"token": "1e2",
"start_offset": 4,
"end_offset": 7,
"type": "word",
"position": 1
},
{
"token": "f",
"start_offset": 7,
"end_offset": 8,
"type": "word",
"position": 2
},
{
"token": "34",
"start_offset": 8,
"end_offset": 10,
"type": "word",
"position": 3
}
]
}
You can check the documentation also : https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html 您也可以查看文档: https : //www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.