elasticsearch - 将嵌套字段与文档中的另一个字段进行比较

Question

我需要比较同一文档中的2个字段，其中实际值无关紧要。 考虑这个文件：

_source: {
    id: 123,
    primary_content_type_id: 12,
    content: [
        {
            id: 4,
            content_type_id: 1
            assigned: true
        },
        {
            id: 5,
            content_type_id: 12,
            assigned: false
        }
    ]
}

我需要找到未分配主要内容的所有文档。 我找不到将primary_content_type_id与嵌套的content.content_type_id进行比较的方法，以确保它们是相同的值。 这是我尝试使用脚本。 我不认为我理解脚本，但这可能是解决此问题的方法：

{
    "filter": {
        "nested": {
            "path": "content",
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "content.assigned": false
                            }
                        },
                        {
                            "script": {
                                "script": "primary_content_type_id==content.content_type_id"
                            }
                        }
                    ]
                }
            }
        }
    }
}

请注意，如果我删除过滤器的脚本部分并将其替换为content_type_id = 12另一个术语过滤器，并且还添加了另一个过滤器，其中primary_content_id = 12 。 问题是我不知道（对我的用例也不重要） primary_content_type_id或content.content_type_id的值是什么。 对于content_type_id与primary_content_type_id匹配的内容，赋值为false是很重要的。

弹性搜索可以检查吗？

Answer 1

在嵌套搜索的情况下，您正在搜索没有父级的嵌套对象。 不幸的是，没有可以对nested对象应用的隐藏连接。

至少目前，这意味着您不会在脚本中同时收到“父”和嵌套文档。 您可以通过用这两个脚本替换脚本并测试结果来确认这一点：

# Parent Document does not exist
"script": {
  "script": "doc['primary_content_type_id'].value == 12"
}

# Nested Document should exist
"script": {
  "script": "doc['content.content_type_id'].value == 12"
}

您可以通过循环跨越object来以性能低劣的方式执行此操作（而不是固有地让ES使用nested为您执行此操作）。 这意味着您必须将文档和nested文档重新索引为单个文档才能使其正常工作。 考虑到你试图使用它的方式，这可能不会太差异，甚至可能表现更好（特别是考虑到缺少替代方案）。

# This assumes that your default scripting language is Groovy (default in 1.4)
# Note1: "find" will loop across all of the values, but it will
#  appropriately short circuit if it finds any!
# Note2: It would be preferable to use doc throughout, but since we need the
#  arrays (plural!) to be in the _same_ order, then we need to parse the
#  _source. This inherently means that you must _store_ the _source, which
#  is the default. Parsing the _source only happens on the first touch.
"script": {
  "script": "_source.content.find { it.content_type_id == _source.primary_content_type_id && ! it.assigned } != null",
  "_cache" : true
}

我缓存的结果，因为没有动态发生在这里（例如，不比较日期以now的情况下），所以它是很安全的高速缓存，从而使未来的查找速度要快得多。 默认情况下，大多数过滤器都是缓存的，但脚本是少数例外之一。

因为它必须比较两个值以确保它找到正确的内部对象，所以你复制了一些工作量，但实际上这是不可避免的。 如果没有它，那么使用term过滤器最有可能优于仅仅进行此检查。

elasticsearch - 将嵌套字段与文档中的另一个字段进行比较

问题描述

1 个解决方案

解决方案1
7 已采纳 2014-11-22 06:50:31

elasticsearch - 将嵌套字段与文档中的另一个字段进行比较

问题描述

1 个解决方案

解决方案1 7 已采纳 2014-11-22 06:50:31

解决方案1
7 已采纳 2014-11-22 06:50:31