[英]How to filter by keys through a nested dictionary in a pythonic way
Try to filter a nested dictionary. 尝试过滤嵌套字典。 My solution is clunky, was hoping to see if there is a better method something using comprehensions.
我的解决方案比较笨拙,希望能使用一种理解方法来找到一种更好的方法。 Only interested in the dictionary and lists for this example.
仅对本示例的词典和列表感兴趣。
_dict_key_filter() will filter the keys of a nested dictionary or a list of nested dictionaries. _dict_key_filter()将过滤嵌套字典或嵌套字典列表的键。 Anything not in the obj_filter will be ignored on all nested levels.
obj_filter中没有的所有内容将在所有嵌套级别上被忽略。
obj : can be a dictionary or a list of dictionaries. obj:可以是字典或字典列表。
obj_filter: has to be a list of filter values obj_filter:必须是过滤器值的列表
def _dict_key_filter(self, obj, obj_filter):
if isinstance(obj, dict):
retdict = {}
for key, value in obj.iteritems():
if key in obj_filter:
retdict[key] = copy.deepcopy(value)
elif isinstance(value, (dict, list)):
child = self._dict_key_filter(value, obj_filter)
if child:
retdict[key] = child
return retdict if retdict else None
elif isinstance(obj, list):
retlist = []
for value in list:
child = self._dict_key_filter(value, obj_filter)
if child:
retlist.append(child)
return retlist if retlist else None
else:
return None
Example#
dict1 = {'test1': {'test2':[1,2]}, 'test3': [{'test6': 2},
{'test8': {'test9': 23}}], 'test4':{'test5': 5}}
filter = ['test5' , 'test9']
return = _dict_key_filter(dict1, filter)
return value would be {'test3': [{'test8': {'test9': 23}}], 'test4': {'test5': 5}}
It's a really old question. 这是一个非常老的问题。 I came across a similar problem recently.
我最近遇到了类似的问题。
It maybe obvious, but you are dealing with a tree in which each node has an arbitray number of children. 也许很明显,但是您要处理的树中每个节点都有任意数量的子代。 You want to cut the subtrees that do not contain some items as nodes (not leaves).
您想要剪切不包含某些项目的子树作为节点(不是叶子)。 To achieve this, you are using a custom DFS: the main function returns either a subtree or
None
. 为此,您使用了自定义DFS:main函数返回子树或
None
。 If the value is None
then you "cut" the branch. 如果值为
None
则“剪切”分支。
First of all, the function dict_key_filter
returns a (non empty) dict
, a (non empty) list
or None
if no filter key was not found in the branch. 首先,函数
dict_key_filter
返回一个(非空) dict
,一个(非空) list
如果在分支中未找到过滤键,则返回None
。 To reduce complexity, you could return a sequence in every case: an empty sequence if no filter key was found, and a non empty sequence if you are still searching or you found the leaf of the tree. 为了降低复杂性,您可以在每种情况下都返回一个序列 :如果未找到过滤键,则返回一个空序列;如果您仍在搜索或找到了树的叶子,则返回一个非空序列。 Your code would look like:
您的代码如下所示:
def dict_key_filter(obj, obj_filter):
if isinstance(obj, dict):
retdict = {}
...
return retdict # empty or not
elif isinstance(obj, list):
retlist = []
...
return retlist # empty or not
else:
return [] # obvioulsy empty
This was the easy part. 这是容易的部分。 Now we have to fill the dots.
现在我们必须填充点。
list
case list
案例 Let's begin with the list
case, since it is the easier to refactor: 让我们从
list
案例开始,因为它很容易重构:
retlist = []
for value in obj:
child = dict_key_filter0(value, obj_filter)
if child:
retlist.append(child)
We can translate this into a simple list comprehension: 我们可以将其转换为简单的列表理解:
retlist = [dict_key_filter(value, obj_filter) for value in obj if dict_key_filter(value, obj_filter)]
The drawback is that dict_key_filter
is evaluated twice. 缺点是
dict_key_filter
被评估两次。 We can avoid this with a little trick (see https://stackoverflow.com/a/15812866 ): 我们可以通过一些技巧来避免这种情况(请参阅https://stackoverflow.com/a/15812866 ):
retlist = [subtree for subtree in (dict_key_filter(value, obj_filter) for value in obj) if subtree]
The inner expression (dict_key_filter(value, obj_filter) for value in obj)
is a generator that calls dict_key_filter
once per value. 内部表达式
(dict_key_filter(value, obj_filter) for value in obj)
是一个生成器,每个值调用一次dict_key_filter
。 But we can even do better if we build a closure of dict_key_filter
: 但是,如果我们构建
dict_key_filter
的闭包,我们甚至可以做得更好:
def dict_key_filter(obj, obj_filter):
def inner_dict_key_filter(obj): return dict_key_filter(obj, obj_filter)
...
retlist = list(filter(len, map(inner_dict_key_filter, obj)))
Now we are in the functional world: map
applies inner_dict_key_filter
to every element of the list and then the subtrees are filtered to exclude empty subtrees ( len(subtree)
is true iff subtree
is not empty). 现在我们进入了功能世界:
map
将inner_dict_key_filter
应用于列表的每个元素,然后子树被过滤以排除空子树(如果subtree
不为空,则len(subtree)
为true)。 Now, the code looks like: 现在,代码如下所示:
def dict_key_filter(obj, obj_filter):
def inner_dict_key_filter(obj): return dict_key_filter(obj, obj_filter)
if isinstance(obj, dict):
retdict = {}
...
return retdict
elif isinstance(obj, list):
return list(filter(len, map(inner_dict_key_filter, obj)))
else:
return []
If you are familiar with functional programming, the list
case is readable (not quite as readable as it would be in Haskell, but still readable). 如果您熟悉函数式编程,则
list
大小写是可读的(不像Haskell那样可读,但仍然可读)。
dict
case dict
案 I do not forget the dictionary-comprehension
tag in your question. 我不会忘记您的问题中的
dictionary-comprehension
标签。 The first idea is to create a function to return either a whole copy of the branch or the result of the rest of the DFS. 第一个想法是创建一个函数以返回分支的完整副本或DFS其余部分的结果。
def build_subtree(key, value):
if key in obj_filter:
return copy.deepcopy(value) # keep the branch
elif isinstance(value, (dict, list)):
return inner_dict_key_filter(value) # continue to search
return [] # just an orphan value here
As in the list
case, we do not refuse empty subtree
s for now: 与
list
情况一样,我们暂时不拒绝空的subtree
:
retdict = {}
for key, value in obj.items():
retdict[key] = build_subtree(key, value)
We have now a perfect case for dict comprehension: 现在,我们有一个完美的字典理解案例:
retdict = {key: build_subtree(key, value) for key, value in obj.items() if build_subtree(key, value)}
Again, we use the little trick to avoid to compute a value twice: 同样,我们使用小技巧来避免两次计算值:
retdict = {key:subtree for key, subtree in ((key, build_subtree(key, value)) for key, value in obj.items()) if subtree}
But we have a little problem here: the code above is not exaclty equivalent to the original code. 但是我们这里有一个小问题:上面的代码并不完全等同于原始代码。 What if the value is
0
? 如果值为
0
怎么办? In the original version, we have retdict[key] = copy.deepcopy(0)
but in the new version we have nothing. 在原始版本中,我们有
retdict[key] = copy.deepcopy(0)
但在新版本中,我们什么都没有。 The 0
value is evaluated as false and filtered. 0
值被评估为false并被过滤。 And then the dict may become empty and we cut the branch wrongfully. 然后,该字典可能会变空,并且我们错误地剪切了该分支。 We need another test to be sure we want to remove a value: if it's an empty list or dict, then remove it, else keep it:
我们需要进行另一项测试以确保我们要删除一个值:如果它是空列表或字典,则将其删除,否则保留它:
def to_keep(subtree): return not (isinstance(subtree, (dict, list)) or len(subtree) == 0)
That is: 那是:
def to_keep(subtree): return not isinstance(subtree, (dict, list)) or subtree
If you remember a bit of logic ( https://en.wikipedia.org/wiki/Truth_table#Logical_implication ) you can interpret this as: if subtree
is a dict or a list, then it must not be empty. 如果您还记得一些逻辑( https://en.wikipedia.org/wiki/Truth_table#Logical_implication ),则可以将其解释为:如果
subtree
是字典或列表,则它不能为空。
Let's put the pieces together: 让我们放在一起:
def dict_key_filter(obj, obj_filter):
def inner_dict_key_filter(obj): return dict_key_filter(obj, obj_filter)
def to_keep(subtree): return not isinstance(subtree, (dict, list)) or subtree
def build_subtree(key, value):
if key in obj_filter:
return copy.deepcopy(value) # keep the branch
elif isinstance(value, (dict, list)):
return inner_dict_key_filter(value) # continue to search
return [] # just an orphan value here
if isinstance(obj, dict):
key_subtree_pairs = ((key, build_subtree(key, value)) for key, value in obj.items())
return {key:subtree for key, subtree in key_subtree_pairs if to_keep(subtree)}
elif isinstance(obj, list):
return list(filter(to_keep, map(inner_dict_key_filter, obj)))
return []
I don't know if this is more pythonic, but it seems clearer to me. 我不知道这是否更适合pythonic,但对我来说似乎更清楚。
dict1 = {
'test1': { 'test2':[1,2] },
'test3': [
{'test6': 2},
{
'test8': { 'test9': 23 }
}
],
'test4':{'test5': 0}
}
obj_filter = ['test5' , 'test9']
print (dict_key_filter(dict1, obj_filter))
# {'test3': [{'test8': {'test9': 23}}], 'test4': {'test5': 0}}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.