[英]Highlight words with whitespace in Elasticsearch 7.6
I would like to use Elasticsearch highlight to obtain matched keywords found inside a text.我想使用 Elasticsearch 高亮来获取在文本中找到的匹配关键字。 This is my settings/mappings这是我的设置/映射
{
"settings": {
"analysis": {
"char_filter": {
"my_char_filter": {
"type": "mapping",
"mappings": [
"- => _",
]
}
},
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
],
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "my_analyzer"
},
"description": {
"type": "text",
"analyzer": "my_analyzer",
"fielddata": True
}
}
}
}
I am using a char_filter to search and highligth hypenated words.我正在使用 char_filter 来搜索和高亮连字符的单词。 This my document example:这是我的文档示例:
{
"_index": "test_tokenizer",
"_type": "_doc",
"_id": "DbBIxXEBL7VGAl98vIRl",
"_score": 1.0,
"_source": {
"title": "Best places: New Mexico and Sedro-Woolley",
"description": "This is an example text containing some cities like New York, Toronto, Rome and many other. So, there are also Milton-Freewater and Las Vegas!"
}
}
and this is the query I use这是我使用的查询
{
"query": {
"query_string" : {
"query" : "\"New York\" OR \"Rome\" OR \"Milton-Freewater\"",
"default_field": "description"
}
},
"highlight" : {
"pre_tags" : ["<key>"],
"post_tags" : ["</key>"],
"fields" : {
"description" : {
"number_of_fragments" : 0
}
}
}
}
and this is the output I have这是我的 output
...
"hits": [
{
"_index": "test_tokenizer",
"_type": "_doc",
"_id": "GrDNz3EBL7VGAl98EITg",
"_score": 0.72928625,
"_source": {
"title": "Best places: New Mexico and Sedro-Woolley",
"description": "This is an example text containing some cities like New York, Toronto, Rome and many other. So, there are also Milton-Freewater and Las Vegas!"
},
"highlight": {
"description": [
"This is an example text containing some cities like <key>New</key> <key>York</key>, Toronto, <key>Rome</key> and many other. So, there are also <key>Milton-Freewater</key> and Las Vegas!"
]
}
}
]
...
Rome and Milton-Freewater are highlighted correctly. Rome和Milton-Freewater正确突出显示。 New York is not纽约不是
How can I have <key>New York</key>
instead of <key>New</key>
and <key>York</key>
?我怎样才能有<key>New York</key>
而不是<key>New</key>
和<key>York</key>
?
There is an open PR regarding this but I'd suggest the following interim solution:对此有一个公开的 PR ,但我建议以下临时解决方案:
term_vector
setting添加term_vector
设置PUT test_tokenizer
{
"settings": {
"analysis": {
"char_filter": {
"my_char_filter": {
"type": "mapping",
"mappings": [
"- => _"
]
}
},
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
],
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "my_analyzer"
},
"description": {
"type": "text",
"analyzer": "my_analyzer",
"term_vector": "with_positions_offsets",
"fielddata": true
}
}
}
}
POST test_tokenizer/_doc
{"title":"Best places: New Mexico and Sedro-Woolley","description":"This is an example text containing some cities like New York, Toronto, Rome and many other. So, there are also Milton-Freewater and Las Vegas!"}
query_string
to a bunch of bool-should match_phrases
inside the highlight_query
and use type: fvh
将您的query_string
转换为highlight_query
中的一组 bool-should match_phrases
并使用type: fvh
GET test_tokenizer/_search
{
"query": {
"query_string": {
"query": "'New York' OR 'Rome' OR 'Milton-Freewater'",
"default_field": "description"
}
},
"highlight": {
"pre_tags": [
"<key>"
],
"post_tags": [
"</key>"
],
"fields": {
"description": {
"highlight_query": {
"bool": {
"should": [
{
"match_phrase": {
"description": "New York"
}
},
{
"match_phrase": {
"description": "Rome"
}
},
{
"match_phrase": {
"description": "Milton-Freewater"
}
}
]
}
},
"type": "fvh",
"number_of_fragments": 0
}
}
}
}
yielding屈服
{
"highlight":{
"description":[
"This is an example text containing some cities like <key>New York</key>, Toronto, <key>Rome</key> and many other. So, there are also <key>Milton-Freewater</key> and Las Vegas!"
]
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.