在一個多重匹配查詢中搜索所有文檔字段（嵌套和根文檔）

Question

讓我們以這些基本文檔為例：

{
  "name": "pants",
  "description": "with stripes",
  "items": [
    {
      "color": "red",
      "size": "44"
    },
    {
      "color": "blue",
      "size": "38"
    }
  ]
}

{
  "name": "shirt",
  "description": "with stripes",
  "items": [
    {
      "color": "green",
      "size": "40"
    }
  ]
}

{
  "name": "pants",
  "description": "with dots",
  "items": [
    {
      "color": "green",
      "size": "38"
    },
    {
      "color": "blue",
      "size": "38"
    }
  ]
}

我需要找到第一個帶有諸如pants stripes blue 38之類的搜索詞的文檔。 所有術語都應與 AND 相關聯，因為我對帶有圓點或其他尺寸和顏色組合的褲子不感興趣。

我的映射如下所示：

{
  "settings": {
    "index.queries.cache.enabled": true,
    "index.number_of_shards": 1,
    "index.number_of_replicas": 2,
    "analysis": {
      "filter": {
        "german_stop": {
          "type": "stop",
          "stopwords": "_german_"
        },
        "german_stemmer": {
          "type": "stemmer",
          "language": "light_german"
        },
        "synonym": {
          "type": "synonym_graph",
          "synonyms_path": "dictionaries/de/synonyms.txt",
          "updateable" : true
        }
      },
      "analyzer": {
        "index_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "german_stop",
            "german_normalization",
            "german_stemmer"
          ]
        },
        "search_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "synonym",
            "german_stop",
            "german_normalization",
            "german_stemmer"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "index_analyzer",
        "search_analyzer": "search_analyzer"
      },
      "description": {
        "type": "text",
        "analyzer": "index_analyzer",
        "search_analyzer": "search_analyzer"
      },
      "items": {
        "type": "nested",
        "properties": {
          "color": {
            "type": "text",
            "analyzer": "index_analyzer",
            "search_analyzer": "search_analyzer"
          },
          "size": {
            "type": "text",
            "analyzer": "index_analyzer",
            "search_analyzer": "search_analyzer"
          }
        }
      }
    }
  }
}

請忽略我使用德語停用詞等的事實。 我將上面的示例文件保留為英文，以便每個人都能理解，但沒有調整映射，因為原始示例是德文的。

所以理想情況下，我希望我的查詢看起來像這樣：

{
  "query": {
    "nested": {
      "path": "items",
      "query": {
        "multi_match": {
          "query": "pants stripes blue 38",
          "fields": [
            "name",
            "description", 
            "items.color",
            "items.size"
          ],
          "type": "cross_fields",
          "operator": "and", 
          "auto_generate_synonyms_phrase_query": "false",
          "fuzzy_transpositions": "false"
        }
      }
    }
  }
}

Kibana 的 Search Profiler 顯示查詢將像這樣執行：

ToParentBlockJoinQuery (
+(
    +(items.color:pant | items.size:pant | name:pant | description:pant)
    +(items.color:strip | items.size:strip | name:strip | description:strip)
    +(items.color:blu | items.size:blu | name:blu | description:blu)
    +(items.color:38 | items.size:38 | name:38 | description:38)
) #_type:__items)

就 AND 和 OR 邏輯而言，這看起來正是我所需要的。 用每個術語搜索每個屬性，並用 AND 連接這些結果。 因此，每個搜索詞都需要在其中一個字段中，但在哪個字段中找到它並不重要。

但是這個查詢似乎只在嵌套文檔中搜索。 事實上，似乎每個查詢只能搜索嵌套對象或根文檔。 不是同時兩個。 如果我刪除嵌套部分，搜索探查器會顯示差異：

{
  "query": {
    "multi_match": {
      "query": "pants stripes blue 38",
      "fields": [
        "name",
        "description",
        "items.color",
        "items.size"
      ],
      "type": "cross_fields",
      "operator": "and",
      "auto_generate_synonyms_phrase_query": "false",
      "fuzzy_transpositions": "false"
    }
  }
}

結果是：

+(
    +(items.color:pant | items.size:pant | name:pant | description:pant)
    +(items.color:strip | items.size:strip | name:strip | description:strip)
    +(items.color:blu | items.size:blu | name:blu | description:blu)
    +(items.color:38 | items.size:38 | name:38 | description:38)
) #DocValuesFieldExistsQuery [field=_primary_term]

兩個查詢都返回零結果。

所以我的問題是，是否有一種方法可以使上述查詢正常工作，並能夠在逐個術語的基礎上，在多重匹配查詢中真正搜索所有定義的字段（嵌套和根文檔）。

我想避免對搜索詞進行任何預處理，以便根據它們在嵌套或根文檔中進行拆分，因為它有自己的一系列挑戰。 但我確實知道這是解決我的問題的方法。

編輯原始文件有更多的屬性。 根文檔可能有多達 250 個字段，每個嵌套文檔可能會添加另外 20-30 個字段。 因為搜索詞需要搜索很多字段（可能不是全部），所以任何形式的嵌套和根文檔屬性的串聯以使它們“可搜索”似乎不切實際。

扁平化索引可能是一個實用的解決方案。 我的意思是將所有根文檔字段復制到嵌套文檔並僅索引嵌套文檔。 但在這個問題中，我想知道它是否也適用於嵌套對象而不修改原始結構。

Answer 1

您對展平的直覺是正確的，但您不需要將根屬性復制到嵌套字段中。 您可以通過include_in_root映射參數執行相反的操作。

當您像這樣更新映射時：

PUT inventory
{
  "settings": {
      ... 
    }
  },
  "mappings": {
    "properties": {
      ...
      "items": {
        "type": "nested",
        "include_in_root": true,     <---
        "properties": {
          ...
        }
      }
    }
  }
}

然后索引一些示例文檔（其中至少一個包括pants ，因為您的原始問題不包含任何內容）：

POST inventory/_doc
{"name":"shirt","description":"with stripes","items":[{"color":"red","size":"44"},{"color":"blue","size":"38"}]}

POST inventory/_doc
{"name":"shirt","description":"with stripes","items":[{"color":"green","size":"40"}]}

POST inventory/_doc
{"name":"shirt","description":"with dots","items":[{"color":"green","size":"38"},{"color":"blue","size":"38"}]}

// this one *should* match
POST inventory/_doc
{"name":"pants","description":"with stripes","items":[{"color":"red","size":"44"},{"color":"blue","size":"39"}]}

POST inventory/_doc
{"name":"pants","description":"with stripes","items":[{"color":"red","size":"44"},{"color":"blue","size":"38"}]}

然后，您可以使用第二個查詢並保持嵌套字段路徑不變，因為它們現在在根目錄中可用，盡管在相同的點路徑下有些混亂：

POST inventory/_search
{
  "query": {
    "multi_match": {
      "query": "pants stripes blue 38",
      "fields": [
        "name",
        "description",
        "items.color",
        "items.size"
      ],
      "type": "cross_fields",
      "operator": "AND",
      "auto_generate_synonyms_phrase_query": "false",
      "fuzzy_transpositions": "false"
    }
  }
}

並且只會返回一個完全匹配的文檔：

{
  "name":"pants",
  "description":"with stripes",
  "items":[
    {
      "color":"red",
      "size":"44"
    },
    {
      "color":"blue",
      "size":"38"
    }
  ]
}

在一個多重匹配查詢中搜索所有文檔字段（嵌套和根文檔）

問題描述

1 個解決方案

解決方案1
0 2021-03-04 23:20:19

在一個多重匹配查詢中搜索所有文檔字段（嵌套和根文檔）

問題描述

1 個解決方案

解決方案1 0 2021-03-04 23:20:19

解決方案1
0 2021-03-04 23:20:19