elasticsearch - 將嵌套字段與文檔中的另一個字段進行比較

Question

我需要比較同一文檔中的2個字段，其中實際值無關緊要。 考慮這個文件：

_source: {
    id: 123,
    primary_content_type_id: 12,
    content: [
        {
            id: 4,
            content_type_id: 1
            assigned: true
        },
        {
            id: 5,
            content_type_id: 12,
            assigned: false
        }
    ]
}

我需要找到未分配主要內容的所有文檔。 我找不到將primary_content_type_id與嵌套的content.content_type_id進行比較的方法，以確保它們是相同的值。 這是我嘗試使用腳本。 我不認為我理解腳本，但這可能是解決此問題的方法：

{
    "filter": {
        "nested": {
            "path": "content",
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "content.assigned": false
                            }
                        },
                        {
                            "script": {
                                "script": "primary_content_type_id==content.content_type_id"
                            }
                        }
                    ]
                }
            }
        }
    }
}

請注意，如果我刪除過濾器的腳本部分並將其替換為content_type_id = 12另一個術語過濾器，並且還添加了另一個過濾器，其中primary_content_id = 12 。 問題是我不知道（對我的用例也不重要） primary_content_type_id或content.content_type_id的值是什么。 對於content_type_id與primary_content_type_id匹配的內容，賦值為false是很重要的。

彈性搜索可以檢查嗎？

Answer 1

在嵌套搜索的情況下，您正在搜索沒有父級的嵌套對象。 不幸的是，沒有可以對nested對象應用的隱藏連接。

至少目前，這意味着您不會在腳本中同時收到“父”和嵌套文檔。 您可以通過用這兩個腳本替換腳本並測試結果來確認這一點：

# Parent Document does not exist
"script": {
  "script": "doc['primary_content_type_id'].value == 12"
}

# Nested Document should exist
"script": {
  "script": "doc['content.content_type_id'].value == 12"
}

您可以通過循環跨越object來以性能低劣的方式執行此操作（而不是固有地讓ES使用nested為您執行此操作）。 這意味着您必須將文檔和nested文檔重新索引為單個文檔才能使其正常工作。 考慮到你試圖使用它的方式，這可能不會太差異，甚至可能表現更好（特別是考慮到缺少替代方案）。

# This assumes that your default scripting language is Groovy (default in 1.4)
# Note1: "find" will loop across all of the values, but it will
#  appropriately short circuit if it finds any!
# Note2: It would be preferable to use doc throughout, but since we need the
#  arrays (plural!) to be in the _same_ order, then we need to parse the
#  _source. This inherently means that you must _store_ the _source, which
#  is the default. Parsing the _source only happens on the first touch.
"script": {
  "script": "_source.content.find { it.content_type_id == _source.primary_content_type_id && ! it.assigned } != null",
  "_cache" : true
}

我緩存的結果，因為沒有動態發生在這里（例如，不比較日期以now的情況下），所以它是很安全的高速緩存，從而使未來的查找速度要快得多。 默認情況下，大多數過濾器都是緩存的，但腳本是少數例外之一。

因為它必須比較兩個值以確保它找到正確的內部對象，所以你復制了一些工作量，但實際上這是不可避免的。 如果沒有它，那么使用term過濾器最有可能優於僅僅進行此檢查。

elasticsearch - 將嵌套字段與文檔中的另一個字段進行比較

問題描述

1 個解決方案

解決方案1
7 已采納 2014-11-22 06:50:31

elasticsearch - 將嵌套字段與文檔中的另一個字段進行比較

問題描述

1 個解決方案

解決方案1 7 已采納 2014-11-22 06:50:31

解決方案1
7 已采納 2014-11-22 06:50:31