简体   繁体   English

合并列表中有字典的多个字典

[英]Merging multiple dictionaries that have dictionaries in list

I have several dictionaries (perhaps 10s of them) that formed like below:我有几个字典(可能有 10 个),形成如下:

{'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 135},
            {'foo': 'C', 'bar': 'B', 'host': 'egg', 'count': 28},
            {'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 1},
            {'foo': 'A', 'bar': 'E', 'host': 'chicken breast', 'count': 1},
            {'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 1}],
 'stderr': ''}

I want to combine all those dictionaries with adding 'count' key's integer with same 'foo','bar' and 'host' keys (None is NoneType)我想将所有这些字典与添加 'count' 键的 integer 与相同的 'foo'、'bar' 和 'host' 键相结合(None 是 NoneType)

For example, for 2 dictionaries例如,对于 2 个字典

dictA = {'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 135},
            {'foo': 'C', 'bar': 'B', 'host': 'egg', 'count': 28},
            {'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 2},
            {'foo': 'A', 'bar': 'E', 'host': 'chicken breast', 'count': 1},
            {'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 1}],
 'stderr': ''}

dictB = {'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 280},
            {'foo': 'A', 'bar': 'B', 'host': 'orange', 'count': 46},
            {'foo': 'A', 'bar': 'E', 'host': 'pineapple', 'count': 3},
            {'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 2},
            {'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 1}],
 'stderr': ''}

Then the merged version should be那么合并后的版本应该是

dictMerged = {'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 415},
            {'foo': 'A', 'bar': 'B', 'host': 'orange', 'count': 46},
            {'foo': 'C', 'bar': 'B', 'host': 'egg', 'count': 28},
            {'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 4},
            {'foo': 'A', 'bar': 'E', 'host': 'pineapple', 'count': 3},
            {'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 2},
            {'foo': 'A', 'bar': 'E', 'host': 'chicken breast', 'count': 1}],
 'stderr': ''}

Note that the dictionary elements in list's order changed after 'count' summed.请注意,列表顺序中的字典元素在 'count' 相加后发生了变化。

I have tried to combine them for same 'host' as a first step like below but it was not same as what I wanted:我已经尝试将它们组合为相同的“主机”,如下所示,但它与我想要的不同:

hostname1 = {i["host"]: i for i in dictA['stdout']}
hostname2 = {i["host"]: i for i in dictB['stdout']}
all_host = hostname1|hostname2
{key: value + b[key] for key, value in a.items()}

One approach一种方法

from collections import defaultdict
from operator import itemgetter

# creat a dictionary (defaultdict) to put the dictionaries with matching foo, bar, host in the same list
groups = defaultdict(list, {(d['foo'], d['bar'], d['host']): [d] for d in dictB['stdout']})
for d in dictA["stdout"]:
    key = (d['foo'], d['bar'], d['host'])
    groups[key].append(d)

# use item getter for better readability
count = itemgetter("count")

# create new list of dictionaries, sum the count values
ds = [{'foo': f, 'bar': b, 'host': h, 'count': sum(count(d) for d in v)} for (f, b, h), v in groups.items()]

# sort the list of dictionaries in decreasing order 
res = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}
print(res)

Output Output

{'stderr': '',
 'stdout': [{'bar': 'B', 'count': 415, 'foo': 'A', 'host': None},
            {'bar': 'B', 'count': 46, 'foo': 'A', 'host': 'orange'},
            {'bar': 'B', 'count': 28, 'foo': 'C', 'host': 'egg'},
            {'bar': 'E', 'count': 4, 'foo': 'D', 'host': 'apple'},
            {'bar': 'E', 'count': 3, 'foo': 'A', 'host': 'pineapple'},
            {'bar': 'F', 'count': 2, 'foo': 'C', 'host': 'carrot'},
            {'bar': 'E', 'count': 1, 'foo': 'A', 'host': 'chicken breast'}]}

For more on each of the functions and data structures used in the code above see: sorted , defaultdict and itemgetter有关上述代码中使用的每个函数和数据结构的更多信息,请参阅: sorteddefaultdictitemgetter

One alternative一种选择

Use groupby :使用groupby

import pprint
from operator import itemgetter
from itertools import groupby


def key(d):
    return d["foo"], d["bar"], d["host"] or ""


count = itemgetter("count")
lst = sorted(dictA["stdout"] + dictB["stdout"], key=key)
groups = groupby(lst, key=key)
ds = [{'foo': f, 'bar': b, 'host': h or None, 'count': sum(count(d) for d in vs)} for (f, b, h), vs in groups]
res = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}
print(res)

This second approach has two caveats:第二种方法有两个警告:

  1. The time complexity is O(nlogn) the first one was O(n)时间复杂度是O(nlogn)第一个是O(n)
  2. In order to sort the list of dictionaries it needs to replace None by the empty string "" .为了对字典列表进行排序,它需要将None替换为空字符串""

Multiple dictionaries多个词典

If you have multiple dictionaries you can change the first approach to:如果您有多个字典,您可以将第一种方法更改为:

# create a dictionary (defaultdict) to put the dictionaries with matching foo, bar, host in the same list
groups = defaultdict(list, {(d['foo'], d['bar'], d['host']): [d] for d in dictB['stdout']})

# create a list with all the dictionaries from multiple dict
data = []
lst = [dictA]  # change this line to contain all the dictionaries except B
for d in lst:
    data.extend(d["stdout"])

for d in data:
    key = (d['foo'], d['bar'], d['host'])
    groups[key].append(d)

# use item getter for better readability
count = itemgetter("count")

# create new list of dictionaries, sum the count values
ds = [{'foo': f, 'bar': b, 'host': h, 'count': sum(count(d) for d in v)} for (f, b, h), v in groups.items()]

# sort the list of dictionaries in decreasing order
res = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}

What is itemgetter ?什么是itemgetter

From the documentation:从文档中:

Return a callable object that fetches item from its operand using the operand's getitem () method.返回一个可调用的 object,它使用操作数的getitem () 方法从其操作数中获取项目。 If multiple items are specified, returns a tuple of lookup values.如果指定了多个项目,则返回查找值的元组。

Is equivalent to:相当于:

def itemgetter(*items):
    if len(items) == 1:
        item = items[0]
        def g(obj):
            return obj[item]
    else:
        def g(obj):
            return tuple(obj[item] for item in items)
    return g

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM