[英]Merging multiple dictionaries that have dictionaries in list
I have several dictionaries (perhaps 10s of them) that formed like below:我有几个字典(可能有 10 个),形成如下:
{'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 135},
{'foo': 'C', 'bar': 'B', 'host': 'egg', 'count': 28},
{'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 1},
{'foo': 'A', 'bar': 'E', 'host': 'chicken breast', 'count': 1},
{'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 1}],
'stderr': ''}
I want to combine all those dictionaries with adding 'count' key's integer with same 'foo','bar' and 'host' keys (None is NoneType)我想将所有这些字典与添加 'count' 键的 integer 与相同的 'foo'、'bar' 和 'host' 键相结合(None 是 NoneType)
For example, for 2 dictionaries例如,对于 2 个字典
dictA = {'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 135},
{'foo': 'C', 'bar': 'B', 'host': 'egg', 'count': 28},
{'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 2},
{'foo': 'A', 'bar': 'E', 'host': 'chicken breast', 'count': 1},
{'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 1}],
'stderr': ''}
dictB = {'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 280},
{'foo': 'A', 'bar': 'B', 'host': 'orange', 'count': 46},
{'foo': 'A', 'bar': 'E', 'host': 'pineapple', 'count': 3},
{'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 2},
{'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 1}],
'stderr': ''}
Then the merged version should be那么合并后的版本应该是
dictMerged = {'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 415},
{'foo': 'A', 'bar': 'B', 'host': 'orange', 'count': 46},
{'foo': 'C', 'bar': 'B', 'host': 'egg', 'count': 28},
{'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 4},
{'foo': 'A', 'bar': 'E', 'host': 'pineapple', 'count': 3},
{'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 2},
{'foo': 'A', 'bar': 'E', 'host': 'chicken breast', 'count': 1}],
'stderr': ''}
Note that the dictionary elements in list's order changed after 'count' summed.请注意,列表顺序中的字典元素在 'count' 相加后发生了变化。
I have tried to combine them for same 'host' as a first step like below but it was not same as what I wanted:我已经尝试将它们组合为相同的“主机”,如下所示,但它与我想要的不同:
hostname1 = {i["host"]: i for i in dictA['stdout']}
hostname2 = {i["host"]: i for i in dictB['stdout']}
all_host = hostname1|hostname2
{key: value + b[key] for key, value in a.items()}
from collections import defaultdict
from operator import itemgetter
# creat a dictionary (defaultdict) to put the dictionaries with matching foo, bar, host in the same list
groups = defaultdict(list, {(d['foo'], d['bar'], d['host']): [d] for d in dictB['stdout']})
for d in dictA["stdout"]:
key = (d['foo'], d['bar'], d['host'])
groups[key].append(d)
# use item getter for better readability
count = itemgetter("count")
# create new list of dictionaries, sum the count values
ds = [{'foo': f, 'bar': b, 'host': h, 'count': sum(count(d) for d in v)} for (f, b, h), v in groups.items()]
# sort the list of dictionaries in decreasing order
res = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}
print(res)
Output Output
{'stderr': '',
'stdout': [{'bar': 'B', 'count': 415, 'foo': 'A', 'host': None},
{'bar': 'B', 'count': 46, 'foo': 'A', 'host': 'orange'},
{'bar': 'B', 'count': 28, 'foo': 'C', 'host': 'egg'},
{'bar': 'E', 'count': 4, 'foo': 'D', 'host': 'apple'},
{'bar': 'E', 'count': 3, 'foo': 'A', 'host': 'pineapple'},
{'bar': 'F', 'count': 2, 'foo': 'C', 'host': 'carrot'},
{'bar': 'E', 'count': 1, 'foo': 'A', 'host': 'chicken breast'}]}
For more on each of the functions and data structures used in the code above see: sorted
, defaultdict
and itemgetter
有关上述代码中使用的每个函数和数据结构的更多信息,请参阅:
sorted
、 defaultdict
和itemgetter
import pprint
from operator import itemgetter
from itertools import groupby
def key(d):
return d["foo"], d["bar"], d["host"] or ""
count = itemgetter("count")
lst = sorted(dictA["stdout"] + dictB["stdout"], key=key)
groups = groupby(lst, key=key)
ds = [{'foo': f, 'bar': b, 'host': h or None, 'count': sum(count(d) for d in vs)} for (f, b, h), vs in groups]
res = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}
print(res)
This second approach has two caveats:第二种方法有两个警告:
O(nlogn)
the first one was O(n)
O(nlogn)
第一个是O(n)
None
by the empty string ""
.None
替换为空字符串""
。If you have multiple dictionaries you can change the first approach to:如果您有多个字典,您可以将第一种方法更改为:
# create a dictionary (defaultdict) to put the dictionaries with matching foo, bar, host in the same list
groups = defaultdict(list, {(d['foo'], d['bar'], d['host']): [d] for d in dictB['stdout']})
# create a list with all the dictionaries from multiple dict
data = []
lst = [dictA] # change this line to contain all the dictionaries except B
for d in lst:
data.extend(d["stdout"])
for d in data:
key = (d['foo'], d['bar'], d['host'])
groups[key].append(d)
# use item getter for better readability
count = itemgetter("count")
# create new list of dictionaries, sum the count values
ds = [{'foo': f, 'bar': b, 'host': h, 'count': sum(count(d) for d in v)} for (f, b, h), v in groups.items()]
# sort the list of dictionaries in decreasing order
res = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}
itemgetter
?itemgetter
? From the documentation:从文档中:
Return a callable object that fetches item from its operand using the operand's getitem () method.
返回一个可调用的 object,它使用操作数的getitem () 方法从其操作数中获取项目。 If multiple items are specified, returns a tuple of lookup values.
如果指定了多个项目,则返回查找值的元组。
Is equivalent to:相当于:
def itemgetter(*items):
if len(items) == 1:
item = items[0]
def g(obj):
return obj[item]
else:
def g(obj):
return tuple(obj[item] for item in items)
return g
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.