简体   繁体   English

elasticsearch - 将嵌套字段与文档中的另一个字段进行比较

[英]elasticsearch - comparing a nested field with another field in the document

I need to compare 2 fields in the same document where the actual value does not matter. 我需要比较同一文档中的2个字段,其中实际值无关紧要。 Consider this document: 考虑这个文件:

_source: {
    id: 123,
    primary_content_type_id: 12,
    content: [
        {
            id: 4,
            content_type_id: 1
            assigned: true
        },
        {
            id: 5,
            content_type_id: 12,
            assigned: false
        }
    ]
}

I need to find all documents in which the primary content is not assigned. 我需要找到未分配主要内容的所有文档。 I cannot find a way to compare the primary_content_type_id to the nested content.content_type_id to assure they are the same value. 我找不到将primary_content_type_id与嵌套的content.content_type_id进行比较的方法,以确保它们是相同的值。 This is what I have tried using a script. 这是我尝试使用脚本。 I do not think I understand scripts but that may be a way to solve this problem: 我不认为我理解脚本,但这可能是解决此问题的方法:

{
    "filter": {
        "nested": {
            "path": "content",
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "content.assigned": false
                            }
                        },
                        {
                            "script": {
                                "script": "primary_content_type_id==content.content_type_id"
                            }
                        }
                    ]
                }
            }
        }
    }
}

Note that it works fine if I remove the script portion of the filter and replace it with another term filter where the content_type_id = 12 and also add another filter where the primary_content_id = 12 . 请注意,如果我删除过滤器的脚本部分并将其替换为content_type_id = 12另一个术语过滤器,并且还添加了另一个过滤器,其中primary_content_id = 12 The problem is that I will not know (nor does it matter for my use case) what the values of primary_content_type_id or content.content_type_id are. 问题是我不知道(对我的用例也不重要) primary_content_type_idcontent.content_type_id的值是什么。 It just matters that the assigned is false for the content where the content_type_id matches the primary_content_type_id . 对于content_type_idprimary_content_type_id匹配的内容,赋值为false是很重要的。

Is this check possible with elasticsearch? 弹性搜索可以检查吗?

In the case of the nested search, you are searching the nested objects without the parent . 在嵌套搜索的情况下,您正在搜索没有父级的嵌套对象 Unfortunately, there is no hidden join that you can apply with nested objects. 不幸的是,没有可以对nested对象应用的隐藏连接。

At least currently, that means you do not receive both the "parent" and the nested document in the script. 至少目前,这意味着您不会在脚本中同时收到“父”和嵌套文档。 You can confirm this by replacing your script with both of these and testing the result: 您可以通过用这两个脚本替换脚本并测试结果来确认这一点:

# Parent Document does not exist
"script": {
  "script": "doc['primary_content_type_id'].value == 12"
}

# Nested Document should exist
"script": {
  "script": "doc['content.content_type_id'].value == 12"
}

You could do this in a performance-inferior way by looping across object s (rather than inherently having ES do this for you with nested ). 可以通过循环跨越object来以性能低劣的方式执行此操作(而不是固有地让ES使用nested为您执行此操作)。 This means that you would have to reindex your documents and nested documents as a single document for this to work. 这意味着您必须将文档和nested文档重新索引为单个文档才能使其正常工作。 Considering the way that you are trying to use it, this probably wouldn't be too different and it may even perform better (especially given the lack of an alternative). 考虑到你试图使用它的方式,这可能不会太差异,甚至可能表现更好(特别是考虑到缺少替代方案)。

# This assumes that your default scripting language is Groovy (default in 1.4)
# Note1: "find" will loop across all of the values, but it will
#  appropriately short circuit if it finds any!
# Note2: It would be preferable to use doc throughout, but since we need the
#  arrays (plural!) to be in the _same_ order, then we need to parse the
#  _source. This inherently means that you must _store_ the _source, which
#  is the default. Parsing the _source only happens on the first touch.
"script": {
  "script": "_source.content.find { it.content_type_id == _source.primary_content_type_id && ! it.assigned } != null",
  "_cache" : true
}

I cached the result because nothing dynamic is occurring here (eg, not comparing dates to now for instance), so it's pretty safe to cache, thereby making future lookups much faster. 我缓存的结果,因为没有动态发生在这里(例如,不比较日期以now的情况下),所以它是很安全的高速缓存,从而使未来的查找速度快得多。 Most filters are cached by default, but scripts are one of the few exceptions . 默认情况下,大多数过滤器都是缓存的,但脚本是少数例外之一

Since it must compare both values to be sure that it found the correct inner object, you are duplicating some amount of work, but it's practically unavoidable. 因为它必须比较两个值以确保它找到正确的内部对象,所以你复制了一些工作量,但实际上这是不可避免的。 Having the term filter is most likely going to be superior to just doing this check without it. 如果没有它,那么使用term过滤器最有可能优于仅仅进行此检查。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM