[英]Search through all document fields (nested and root document) in one multi match query
讓我們以這些基本文檔為例:
{
"name": "pants",
"description": "with stripes",
"items": [
{
"color": "red",
"size": "44"
},
{
"color": "blue",
"size": "38"
}
]
}
{
"name": "shirt",
"description": "with stripes",
"items": [
{
"color": "green",
"size": "40"
}
]
}
{
"name": "pants",
"description": "with dots",
"items": [
{
"color": "green",
"size": "38"
},
{
"color": "blue",
"size": "38"
}
]
}
我需要找到第一個帶有諸如pants stripes blue 38
之類的搜索詞的文檔。 所有術語都應與 AND 相關聯,因為我對帶有圓點或其他尺寸和顏色組合的褲子不感興趣。
我的映射如下所示:
{
"settings": {
"index.queries.cache.enabled": true,
"index.number_of_shards": 1,
"index.number_of_replicas": 2,
"analysis": {
"filter": {
"german_stop": {
"type": "stop",
"stopwords": "_german_"
},
"german_stemmer": {
"type": "stemmer",
"language": "light_german"
},
"synonym": {
"type": "synonym_graph",
"synonyms_path": "dictionaries/de/synonyms.txt",
"updateable" : true
}
},
"analyzer": {
"index_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"german_stop",
"german_normalization",
"german_stemmer"
]
},
"search_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"synonym",
"german_stop",
"german_normalization",
"german_stemmer"
]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "index_analyzer",
"search_analyzer": "search_analyzer"
},
"description": {
"type": "text",
"analyzer": "index_analyzer",
"search_analyzer": "search_analyzer"
},
"items": {
"type": "nested",
"properties": {
"color": {
"type": "text",
"analyzer": "index_analyzer",
"search_analyzer": "search_analyzer"
},
"size": {
"type": "text",
"analyzer": "index_analyzer",
"search_analyzer": "search_analyzer"
}
}
}
}
}
}
請忽略我使用德語停用詞等的事實。 我將上面的示例文件保留為英文,以便每個人都能理解,但沒有調整映射,因為原始示例是德文的。
所以理想情況下,我希望我的查詢看起來像這樣:
{
"query": {
"nested": {
"path": "items",
"query": {
"multi_match": {
"query": "pants stripes blue 38",
"fields": [
"name",
"description",
"items.color",
"items.size"
],
"type": "cross_fields",
"operator": "and",
"auto_generate_synonyms_phrase_query": "false",
"fuzzy_transpositions": "false"
}
}
}
}
}
Kibana 的 Search Profiler 顯示查詢將像這樣執行:
ToParentBlockJoinQuery (
+(
+(items.color:pant | items.size:pant | name:pant | description:pant)
+(items.color:strip | items.size:strip | name:strip | description:strip)
+(items.color:blu | items.size:blu | name:blu | description:blu)
+(items.color:38 | items.size:38 | name:38 | description:38)
) #_type:__items)
就 AND 和 OR 邏輯而言,這看起來正是我所需要的。 用每個術語搜索每個屬性,並用 AND 連接這些結果。 因此,每個搜索詞都需要在其中一個字段中,但在哪個字段中找到它並不重要。
但是這個查詢似乎只在嵌套文檔中搜索。 事實上,似乎每個查詢只能搜索嵌套對象或根文檔。 不是同時兩個。 如果我刪除嵌套部分,搜索探查器會顯示差異:
{
"query": {
"multi_match": {
"query": "pants stripes blue 38",
"fields": [
"name",
"description",
"items.color",
"items.size"
],
"type": "cross_fields",
"operator": "and",
"auto_generate_synonyms_phrase_query": "false",
"fuzzy_transpositions": "false"
}
}
}
結果是:
+(
+(items.color:pant | items.size:pant | name:pant | description:pant)
+(items.color:strip | items.size:strip | name:strip | description:strip)
+(items.color:blu | items.size:blu | name:blu | description:blu)
+(items.color:38 | items.size:38 | name:38 | description:38)
) #DocValuesFieldExistsQuery [field=_primary_term]
兩個查詢都返回零結果。
所以我的問題是,是否有一種方法可以使上述查詢正常工作,並能夠在逐個術語的基礎上,在多重匹配查詢中真正搜索所有定義的字段(嵌套和根文檔)。
我想避免對搜索詞進行任何預處理,以便根據它們在嵌套或根文檔中進行拆分,因為它有自己的一系列挑戰。 但我確實知道這是解決我的問題的方法。
編輯原始文件有更多的屬性。 根文檔可能有多達 250 個字段,每個嵌套文檔可能會添加另外 20-30 個字段。 因為搜索詞需要搜索很多字段(可能不是全部),所以任何形式的嵌套和根文檔屬性的串聯以使它們“可搜索”似乎不切實際。
扁平化索引可能是一個實用的解決方案。 我的意思是將所有根文檔字段復制到嵌套文檔並僅索引嵌套文檔。 但在這個問題中,我想知道它是否也適用於嵌套對象而不修改原始結構。
您對展平的直覺是正確的,但您不需要將根屬性復制到嵌套字段中。 您可以通過include_in_root
映射參數執行相反的操作。
當您像這樣更新映射時:
PUT inventory
{
"settings": {
...
}
},
"mappings": {
"properties": {
...
"items": {
"type": "nested",
"include_in_root": true, <---
"properties": {
...
}
}
}
}
}
然后索引一些示例文檔(其中至少一個包括pants
,因為您的原始問題不包含任何內容):
POST inventory/_doc
{"name":"shirt","description":"with stripes","items":[{"color":"red","size":"44"},{"color":"blue","size":"38"}]}
POST inventory/_doc
{"name":"shirt","description":"with stripes","items":[{"color":"green","size":"40"}]}
POST inventory/_doc
{"name":"shirt","description":"with dots","items":[{"color":"green","size":"38"},{"color":"blue","size":"38"}]}
// this one *should* match
POST inventory/_doc
{"name":"pants","description":"with stripes","items":[{"color":"red","size":"44"},{"color":"blue","size":"39"}]}
POST inventory/_doc
{"name":"pants","description":"with stripes","items":[{"color":"red","size":"44"},{"color":"blue","size":"38"}]}
然后,您可以使用第二個查詢並保持嵌套字段路徑不變,因為它們現在在根目錄中可用,盡管在相同的點路徑下有些混亂:
POST inventory/_search
{
"query": {
"multi_match": {
"query": "pants stripes blue 38",
"fields": [
"name",
"description",
"items.color",
"items.size"
],
"type": "cross_fields",
"operator": "AND",
"auto_generate_synonyms_phrase_query": "false",
"fuzzy_transpositions": "false"
}
}
}
並且只會返回一個完全匹配的文檔:
{
"name":"pants",
"description":"with stripes",
"items":[
{
"color":"red",
"size":"44"
},
{
"color":"blue",
"size":"38"
}
]
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.