简体   繁体   English

如何以pythonic方式通过嵌套字典按键过滤

[英]How to filter by keys through a nested dictionary in a pythonic way

Try to filter a nested dictionary. 尝试过滤嵌套字典。 My solution is clunky, was hoping to see if there is a better method something using comprehensions. 我的解决方案比较笨拙,希望能使用一种理解方法来找到一种更好的方法。 Only interested in the dictionary and lists for this example. 仅对本示例的词典和列表感兴趣。

_dict_key_filter() will filter the keys of a nested dictionary or a list of nested dictionaries. _dict_key_filter()将过滤嵌套字典或嵌套字典列表的键。 Anything not in the obj_filter will be ignored on all nested levels. obj_filter中没有的所有内容将在所有嵌套级别上被忽略。

obj : can be a dictionary or a list of dictionaries. obj:可以是字典或字典列表。

obj_filter: has to be a list of filter values obj_filter:必须是过滤器值的列表

def _dict_key_filter(self, obj, obj_filter):
    if isinstance(obj, dict):
        retdict = {}
        for key, value in obj.iteritems():
            if key in obj_filter:
                retdict[key] = copy.deepcopy(value)
            elif isinstance(value, (dict, list)):
                child = self._dict_key_filter(value, obj_filter)
                if child:
                    retdict[key] = child
        return retdict if retdict else None
    elif isinstance(obj, list):
        retlist = []
        for value in list:
            child = self._dict_key_filter(value, obj_filter)
            if child:
                retlist.append(child)
        return retlist if retlist else None
    else:
        return None

Example#
dict1 = {'test1': {'test2':[1,2]}, 'test3': [{'test6': 2}, 
         {'test8': {'test9': 23}}], 'test4':{'test5': 5}}

filter = ['test5' , 'test9']

return = _dict_key_filter(dict1, filter)

return value would be {'test3': [{'test8': {'test9': 23}}], 'test4': {'test5': 5}}

It's a really old question. 这是一个非常老的问题。 I came across a similar problem recently. 我最近遇到了类似的问题。

It maybe obvious, but you are dealing with a tree in which each node has an arbitray number of children. 也许很明显,但是您要处理的树中每个节点都有任意数量的子代。 You want to cut the subtrees that do not contain some items as nodes (not leaves). 您想要剪切不包含某些项目的子树作为节点(不是叶子)。 To achieve this, you are using a custom DFS: the main function returns either a subtree or None . 为此,您使用了自定义DFS:main函数返回子树或None If the value is None then you "cut" the branch. 如果值为None则“剪切”分支。

First of all, the function dict_key_filter returns a (non empty) dict , a (non empty) list or None if no filter key was not found in the branch. 首先,函数dict_key_filter返回一个(非空) dict ,一个(非空) list如果在分支中未找到过滤键,则返回None To reduce complexity, you could return a sequence in every case: an empty sequence if no filter key was found, and a non empty sequence if you are still searching or you found the leaf of the tree. 为了降低复杂性,您可以在每种情况下都返回一个序列 :如果未找到过滤键,则返回一个空序列;如果您仍在搜索或找到了树的叶子,则返回一个非空序列。 Your code would look like: 您的代码如下所示:

def dict_key_filter(obj, obj_filter):
    if isinstance(obj, dict):
        retdict = {}
        ...
        return retdict # empty or not
    elif isinstance(obj, list):
        retlist = []
        ...
        return retlist # empty or not
    else:
        return [] # obvioulsy empty

This was the easy part. 这是容易的部分。 Now we have to fill the dots. 现在我们必须填充点。

The list case list案例

Let's begin with the list case, since it is the easier to refactor: 让我们从list案例开始,因为它很容易重构:

retlist = []
for value in obj:
    child = dict_key_filter0(value, obj_filter)
    if child:
        retlist.append(child)

We can translate this into a simple list comprehension: 我们可以将其转换为简单的列表理解:

retlist = [dict_key_filter(value, obj_filter) for value in obj if dict_key_filter(value, obj_filter)]

The drawback is that dict_key_filter is evaluated twice. 缺点是dict_key_filter被评估两次。 We can avoid this with a little trick (see https://stackoverflow.com/a/15812866 ): 我们可以通过一些技巧来避免这种情况(请参阅https://stackoverflow.com/a/15812866 ):

retlist = [subtree for subtree in (dict_key_filter(value, obj_filter) for value in obj) if subtree]

The inner expression (dict_key_filter(value, obj_filter) for value in obj) is a generator that calls dict_key_filter once per value. 内部表达式(dict_key_filter(value, obj_filter) for value in obj)是一个生成器,每个值调用一次dict_key_filter But we can even do better if we build a closure of dict_key_filter : 但是,如果我们构建dict_key_filter的闭包,我们甚至可以做得更好:

def dict_key_filter(obj, obj_filter):
    def inner_dict_key_filter(obj): return dict_key_filter(obj, obj_filter)

    ...

    retlist = list(filter(len, map(inner_dict_key_filter, obj)))

Now we are in the functional world: map applies inner_dict_key_filter to every element of the list and then the subtrees are filtered to exclude empty subtrees ( len(subtree) is true iff subtree is not empty). 现在我们进入了功能世界: mapinner_dict_key_filter应用于列表的每个元素,然后子树被过滤以排除空子树(如果subtree不为空,则len(subtree)为true)。 Now, the code looks like: 现在,代码如下所示:

def dict_key_filter(obj, obj_filter):
    def inner_dict_key_filter(obj): return dict_key_filter(obj, obj_filter)

    if isinstance(obj, dict):
        retdict = {}
        ...
        return retdict
    elif isinstance(obj, list):
        return list(filter(len, map(inner_dict_key_filter, obj)))
    else:
        return []

If you are familiar with functional programming, the list case is readable (not quite as readable as it would be in Haskell, but still readable). 如果您熟悉函数式编程,则list大小写是可读的(不像Haskell那样可读,但仍然可读)。

The dict case dict

I do not forget the dictionary-comprehension tag in your question. 我不会忘记您的问题中的dictionary-comprehension标签。 The first idea is to create a function to return either a whole copy of the branch or the result of the rest of the DFS. 第一个想法是创建一个函数以返回分支的完整副本或DFS其余部分的结果。

def build_subtree(key, value):
    if key in obj_filter:
        return copy.deepcopy(value) # keep the branch
    elif isinstance(value, (dict, list)):
        return inner_dict_key_filter(value) # continue to search
    return [] # just an orphan value here

As in the list case, we do not refuse empty subtree s for now: list情况一样,我们暂时不拒绝空的subtree

retdict = {}
for key, value in obj.items():
    retdict[key] = build_subtree(key, value)

We have now a perfect case for dict comprehension: 现在,我们有一个完美的字典理解案例:

retdict = {key: build_subtree(key, value) for key, value in obj.items() if build_subtree(key, value)}

Again, we use the little trick to avoid to compute a value twice: 同样,我们使用小技巧来避免两次计算值:

retdict = {key:subtree for key, subtree in ((key, build_subtree(key, value)) for key, value in obj.items()) if subtree}

But we have a little problem here: the code above is not exaclty equivalent to the original code. 但是我们这里有一个小问题:上面的代码并不完全等同于原始代码。 What if the value is 0 ? 如果值为0怎么办? In the original version, we have retdict[key] = copy.deepcopy(0) but in the new version we have nothing. 在原始版本中,我们有retdict[key] = copy.deepcopy(0)但在新版本中,我们什么都没有。 The 0 value is evaluated as false and filtered. 0值被评估为false并被过滤。 And then the dict may become empty and we cut the branch wrongfully. 然后,该字典可能会变空,并且我们错误地剪切了该分支。 We need another test to be sure we want to remove a value: if it's an empty list or dict, then remove it, else keep it: 我们需要进行另一项测试以确保我们要删除一个值:如果它是空列表或字典,则将其删除,否则保留它:

def to_keep(subtree): return not (isinstance(subtree, (dict, list)) or len(subtree) == 0)

That is: 那是:

 def to_keep(subtree): return not isinstance(subtree, (dict, list)) or subtree

If you remember a bit of logic ( https://en.wikipedia.org/wiki/Truth_table#Logical_implication ) you can interpret this as: if subtree is a dict or a list, then it must not be empty. 如果您还记得一些逻辑( https://en.wikipedia.org/wiki/Truth_table#Logical_implication ),则可以将其解释为:如果subtree是字典或列表,则它不能为空。

Let's put the pieces together: 让我们放在一起:

def dict_key_filter(obj, obj_filter):
    def inner_dict_key_filter(obj): return dict_key_filter(obj, obj_filter)
    def to_keep(subtree): return not isinstance(subtree, (dict, list)) or subtree

    def build_subtree(key, value):
        if key in obj_filter:
            return copy.deepcopy(value) # keep the branch
        elif isinstance(value, (dict, list)):
            return inner_dict_key_filter(value) # continue to search
        return [] # just an orphan value here

    if isinstance(obj, dict):
        key_subtree_pairs = ((key, build_subtree(key, value)) for key, value in obj.items())
        return {key:subtree for key, subtree in key_subtree_pairs if to_keep(subtree)}
    elif isinstance(obj, list):
        return list(filter(to_keep, map(inner_dict_key_filter, obj)))
    return []

I don't know if this is more pythonic, but it seems clearer to me. 我不知道这是否更适合pythonic,但对我来说似乎更清楚。

dict1 = {
    'test1': { 'test2':[1,2] }, 
    'test3': [
        {'test6': 2}, 
        {
            'test8': { 'test9': 23 }
        }
    ],
    'test4':{'test5': 0}
}

obj_filter = ['test5' , 'test9']

print (dict_key_filter(dict1, obj_filter))
# {'test3': [{'test8': {'test9': 23}}], 'test4': {'test5': 0}}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM