简体   繁体   English

ElasticSearch:过滤深层嵌套数据

[英]ElasticSearch: Filter on deeply nested data

Our data is stored in MongoDB 2.4.8, and indexed to ElasticSearch 0.90.7 using the ElasticSearch MongoDB River 1.7.3. 我们的数据存储在MongoDB 2.4.8中,并使用ElasticSearch MongoDB River 1.7.3索引到ElasticSearch 0.90.7。

Our data indexes correctly, and I can successfully search the fields we want to search. 我们的数据索引正确,我可以成功搜索我们想要搜索的字段。 But I also need to filter on permission - of course we only want to return results the calling user can actually read. 但我还需要过滤权限 - 当然我们只想返回调用用户实际可以读取的结果。

In the code on our server, I have the calling user's authorizations as an array, for example: 在我们服务器上的代码中,我将调用用户的授权作为数组,例如:

[ "Role:REGISTERED_USER", "Account:52c74b25da06f102c90d52f4", "Role:USER", "Group:52cb057cda06ca463e78f0d7" ]

An example of the unit data we're searching follows: 我们正在搜索的单位数据的示例如下:

{
    "_id" : ObjectId("52dffbd6da06422559386f7d"),
    "content" : "various stuff",
    "ownerId" : ObjectId("52d96bfada0695fcbdb41daf"),
    "acls" : [
        {
            "accessMap" : {},
            "sourceClass" : "com.bulb.learn.domain.units.PublishedPageUnit",
            "sourceId" : ObjectId("52dffbd6da06422559386f7d")
        },
        {
            "accessMap" : {
                "Role:USER" : {
                    "allow" : [
                        "READ"
                    ]
                },
                "Account:52d96bfada0695fcbdb41daf" : {
                    "allow" : [
                        "CREATE",
                        "READ",
                        "UPDATE",
                        "DELETE",
                        "GRANT"
                    ]
                }
            },
            "sourceClass" : "com.bulb.learn.domain.units.CompositeUnit",
            "sourceId" : ObjectId("52dffb54da06422559386f57")
        }
    ]
}

In the sample data above, I have replaced all the searchable content with "content" : "various stuff" 在上面的示例数据中,我用"content" : "various stuff"替换了所有可搜索的内容"content" : "various stuff"

The authorization data is in the "acls" array. 授权数据位于“acls”数组中。 The filter I need to write would do the following (in English): 我需要编写的过滤器将执行以下操作(英文):

pass all units where the "acls" array
contains an "accessMap" object
that contains a property whose name is one of the user's authorization strings
and whose "allow" property contains "READ"
and whose "deny" property does not contain "READ"

In the example above, the user has "Role:USER" authorization, and this unit has an accessMap that has "Role:USER", which contains "allow", which contains "READ", and "Role:USER" contains no "deny". 在上面的示例中,用户具有“Role:USER”授权,并且此单元具有“Role:USER”的accessMap,其中包含“allow”,其中包含“READ”,“Role:USER”不包含“拒绝”。 So this unit would pass the filter. 所以这个单位会通过过滤器。

I am not seeing how to write a filter for this using ElasticSearch. 我没有看到如何使用ElasticSearch为此编写过滤器。

I get the impression that there are two ways to deal with nested arrays like this: "nested", or "has_child" (or "has_parent"). 我得到的印象是有两种方法可以处理嵌套数组:“嵌套”或“has_child”(或“has_parent”)。

We are reluctant to use the "nested" filter because it apparently requires that the whole block be re-indexed when any of the data changes. 我们不愿意使用“嵌套”过滤器,因为它显然要求在任何数据更改时重新索引整个块。 Searchable content and authorization data can change at any time, in response to user actions. 可搜索的内容和授权数据可以随时更改,以响应用户操作。

It looks to me as though in order to use "has_child" or "has_parent", the authorization data would have to be separate from the unit data (in a different collection?), and when a node is indexed, it would have to have its parent or child specified. 在我看来,为了使用“has_child”或“has_parent”,授权数据必须与单元数据分开(在不同的集合中?),并且当节点被索引时,它必须具有其父母或子女指定。 I don't know whether the ElasticSearch MongoDB River is capable of doing this. 我不知道ElasticSearch MongoDB River是否能够做到这一点。

So is this even possible? 这甚至可能吗? Or should we rearrange the authorization data? 或者我们应该重新安排授权数据?

You need to restructure your data a bit. 您需要重新调整数据结构。

Having a value in a key is problematic with Elasticsearch. 在Elasticsearch中,在键中使用值是有问题的。 It'll end up as a separate field, and you'll have an ever-growing mapping and consequently also cluster state. 它最终将作为一个单独的字段,你将拥有一个不断增长的映射,因此也是集群状态。

You probably want to have accessMap be a list of objects, with what's currently a key as a value. 您可能希望将accessMap作为对象列表,使用当前作为值的键。 Then, it'll have to be nested. 然后,它必须嵌套。 Otherwise, you have no way of knowing which accessMap the matching allow belongs to. 否则,您无法知道匹配允许属于哪个accessMap。

Whether the ACLs should be nested (resulting in two levels of nesting) or a parent-child depends a bit on how often you update the various objects. ACL是否应嵌套(导致嵌套的两个级别)或父子级取决于更新各种对象的频率。 By having them as nested docs on the object, you pay the cost of joining every time something's updated. 通过将它们作为对象的嵌套文档,您可以支付每次更新时加入的成本。 If you do parent-child, you'll need to pay the join-cost on every search. 如果您做亲子,您需要在每次搜索时支付加入费用。

This quickly gets complicated, so I prepared a simplified runnable example you can play with: https://www.found.no/play/gist/8582654 这很快变得复杂,所以我准备了一个简化的可运行的例子,你可以玩: https//www.found.no/play/gist/8582654

Note how the nested - and bool -filters are, erm, nested. 注意nested - 和bool -filters是如何嵌套的。 It wouldn't work to have two nested with a bool in it. 将两个嵌套在一个bool中是行不通的。

#!/bin/bash

export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

# Create indexes

curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
    "settings": {
        "analysis": {}
    },
    "mappings": {
        "type": {
            "properties": {
                "acls": {
                    "type": "nested",
                    "properties": {
                        "accessMap": {
                            "type": "nested",
                            "properties": {
                                "allow": {
                                    "type": "string",
                                    "index": "not_analyzed"
                                },
                                "deny": {
                                    "type": "string",
                                    "index": "not_analyzed"
                                },
                                "key": {
                                    "type": "string",
                                    "index": "not_analyzed"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}'


# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type","_id":1}}
{"acls":[{"accessMap":[{"key":"Role:USER","allow":["READ"]},{"key":"Account:52d96bfada0695fcbdb41daf","allow":["READ","UPDATE"]}]}]}
{"index":{"_index":"play","_type":"type","_id":2}}
{"acls":[{"accessMap":[{"key":"Role:USER","allow":["READ"]},{"key":"Account:52d96bfada0695fcbdb41daf","deny":["READ","UPDATE"]}]}]}
{"index":{"_index":"play","_type":"type","_id":3}}
{"acls":[{"accessMap":[{"key":"Role:USER","allow":["READ"]},{"key":"Account:52d96bfada0695fcbdb41daf","allow":["READ","UPDATE"]}]}]}
'

# Do searches

curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "filtered": {
            "filter": {
                "nested": {
                    "path": "acls",
                    "filter": {
                        "bool": {
                            "must": {
                                "nested": {
                                    "path": "acls.accessMap",
                                    "filter": {
                                        "bool": {
                                            "must": [
                                                {
                                                    "term": {
                                                        "allow": "READ"
                                                    }
                                                },
                                                {
                                                    "terms": {
                                                        "key": [
                                                            "Role:USER",
                                                            "Account:52d96bfada0695fcbdb41daf"
                                                        ]
                                                    }
                                                }
                                            ]
                                        }
                                    }
                                }
                            },
                            "must_not": {
                                "nested": {
                                    "path": "acls.accessMap",
                                    "filter": {
                                        "bool": {
                                            "must": [
                                                {
                                                    "term": {
                                                        "deny": "READ"
                                                    }
                                                },
                                                {
                                                    "terms": {
                                                        "key": [
                                                            "Role:USER",
                                                            "Account:52d96bfada0695fcbdb41daf"
                                                        ]
                                                    }
                                                }
                                            ]
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
'

For completeness, here is a similar example for the parent-child-approach: https://www.found.no/play/gist/8586840 为了完整性,以下是父子方法的类似示例: https//www.found.no/play/gist/8586840

#!/bin/bash

export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

# Create indexes

curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
    "settings": {
        "analysis": {}
    },
    "mappings": {
        "acl": {
            "_parent": {
                "type": "document"
            },
            "properties": {
                "acls": {
                    "properties": {
                        "accessMap": {
                            "type": "nested",
                            "properties": {
                                "key": {
                                    "type": "string",
                                    "index": "not_analyzed"
                                },
                                "allow": {
                                    "type": "string",
                                    "index": "not_analyzed"
                                },
                                "deny": {
                                    "type": "string",
                                    "index": "not_analyzed"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}'


# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"document","_id":1}}
{"title":"Doc 1"}
{"index":{"_index":"play","_type":"acl","_parent":1}}
{"acls":[{"accessMap":[{"key":"Role:USER","allow":["READ"]},{"key":"Account:52d96bfada0695fcbdb41daf","allow":["READ","UPDATE"]}]}]}
{"index":{"_index":"play","_type":"document","_id":2}}
{"title":"Doc 2"}
{"index":{"_index":"play","_type":"acl","_parent":2}}
{"acls":[{"accessMap":[{"key":"Role:USER","allow":["READ"]},{"key":"Account:52d96bfada0695fcbdb41daf","deny":["READ","UPDATE"]}]}]}
'

# Do searches

curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "filtered": {
            "filter": {
                "has_child": {
                    "type": "acl",
                    "filter": {
                        "bool": {
                            "must": [
                                {
                                    "nested": {
                                        "path": "acls.accessMap",
                                        "filter": {
                                            "bool": {
                                                "must": [
                                                    {
                                                        "terms": {
                                                            "key": [
                                                                "Role:USER",
                                                                "Account:52d96bfada0695fcbdb41daf"
                                                            ]
                                                        }
                                                    },
                                                    {
                                                        "term": {
                                                            "allow": "READ"
                                                        }
                                                    }
                                                ]
                                            }
                                        }
                                    }
                                }
                            ],
                            "must_not": [
                                {
                                    "nested": {
                                        "path": "acls.accessMap",
                                        "filter": {
                                            "bool": {
                                                "must": [
                                                    {
                                                        "terms": {
                                                            "key": [
                                                                "Role:USER",
                                                                "Account:52d96bfada0695fcbdb41daf"
                                                            ]
                                                        }
                                                    },
                                                    {
                                                        "term": {
                                                            "deny": "READ"
                                                        }
                                                    }
                                                ]
                                            }
                                        }
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        }
    }
}
'

Thanks, @Alex Brasetvik, your suggestion to make the subject IDs data instead of keys, and your explanation that nested is "join-per-update", but parent-child is "join-per-query", were most helpful. 谢谢@Alex Brasetvik,您建议制作主题ID数据而非密钥,嵌套的解释是“每次更新加入”,但是父子是“按查询加入”,这是最有帮助的。

I see that I would have to "un-nest" the data to use the parent-child approach, and we prefer to keep the authorization data nested. 我看到我必须“取消嵌套”数据以使用父子方法,我们更喜欢嵌套授权数据。

I don't understand what you meant by "It wouldn't work to have two nested with a bool in it." 我不明白你的意思是“将两个嵌套在一个bool中是不行的。”

Here's how I refactored the data: 这是我重构数据的方式:

{
    "_id" : ObjectId("52dffbd6da06422559386f7d"),
    "content" : "various stuff",
    "ownerId" : ObjectId("52d96bfada0695fcbdb41daf"),
    "accessMaps" : [
        {
            "sourceClass" : "com.bulb.learn.domain.units.PublishedPageUnit",
            "sourceId" : ObjectId("52dffbd6da06422559386f7d")
        },
        {
            "allow" : {
                "CREATE" : [
                    "Account:52d96bfada0695fcbdb41daf"
                ],
                "READ" : [
                    "Account:52d96bfada0695fcbdb41daf",
                    "Role:USER"
                ],
                "UPDATE" : [
                    "Account:52d96bfada0695fcbdb41daf"
                ],
                "DELETE" : [
                    "Account:52d96bfada0695fcbdb41daf"
                ],
                "GRANT" : [
                    "Account:52d96bfada0695fcbdb41daf"
                ]
            },
            "deny" : {},
            "sourceClass" : "com.bulb.learn.domain.units.CompositeUnit",
            "sourceId" : ObjectId("52dffb54da06422559386f57")
        }
    ]
}

The new mapping looks like this: 新映射看起来像这样:

{
  "unit": {
    "properties": {
      "accessMaps": {
        "type": "nested",
        "properties": {
          "allow": {
            "type": "nested",
            "properties": {
              "CREATE": {
                "type": "string",
                "index": "not_analyzed",
              },
              "DELETE": {
                "type": "string",
                "index": "not_analyzed",
              },
              "GRANT": {
                "type": "string",
                "index": "not_analyzed",
              },
              "READ": {
                "type": "string",
                "index": "not_analyzed",
              },
              "UPDATE": {
                "type": "string",
                "index": "not_analyzed",
              }
            } 
          },    
          "deny": {
            "type": "nested",
            "properties": {
              "CREATE": {
                "type": "string",
                "index": "not_analyzed",
              },
              "DELETE": {
                "type": "string",
                "index": "not_analyzed",
              },
              "GRANT": {
                "type": "string",
                "index": "not_analyzed",
              },
              "READ": {
                "type": "string",
                "index": "not_analyzed",
              },
              "UPDATE": {
                "type": "string",
                "index": "not_analyzed",
              } 
            }   
          },    
          "sourceClass": {
            "type": "string"
          },
          "sourceId": {
            "type": "string"
          }
        }
      }
    }
  }
}

And the filtered query looks like this: 过滤后的查询如下所示:

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "bool": {
          "must": {
            "nested": {
              "path": "accessMaps.allow",
              "filter": {
                "terms": {
                  "accessMaps.allow.READ": [
                    "Role:REGISTERED_USER",
                    "Account:52e6a361da06e4eb64172519",
                    "Role:USER",
                    "Group:52cb057cda06ca463e78f0d7"
                  ]
                }
              }
            }
          },
          "must_not": {
            "nested": {
              "path": "accessMaps.deny",
              "filter": {
                "terms": {
                  "accessMaps.deny.READ": [
                    "Role:REGISTERED_USER",
                    "Account:52e6a361da06e4eb64172519",
                    "Role:USER",
                    "Group:52cb057cda06ca463e78f0d7"
                  ]
                }
              }
            }
          }
        }
      }
    }
  }
}

The biggest problem I had was figuring out how to use the "path" property in the nested filter, and that the field name in the terms filter must be fully-qualified. 我遇到的最大问题是弄清楚如何在嵌套过滤器中使用“path”属性,并且术语过滤器中的字段名称必须是完全限定的。 I wish ElasticSearch would put more effort into their documentation. 我希望ElasticSearch能够在他们的文档中投入更多精力。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM