Python。用 SUM 合并 dict 行

Question

我有很多 dict 行，超过 1000 万，如下所示：

{'value_01': '123', 'value_02': '456', 'datacenter': '1', 'bytes': '25'}
{'value_01': '123', 'value_02': '456', 'datacenter': '1', 'bytes': '35'}
{'value_01': '678', 'value_02': '901', 'datacenter': '2', 'bytes': '55'}
{'value_01': '678', 'value_02': '456', 'datacenter': '2', 'bytes': '15'}

是否可以将所有其他键和值都相同的行合并为一个使总和为'bytes':我想最小化行数并拥有这样的。 它应该加快处理的后续步骤。

{'value_01': '123', 'value_02': '456', 'datacenter': '1', 'bytes': '60'}
{'value_01': '678', 'value_02': '901', 'datacenter': '2', 'bytes': '55'}
{'value_01': '678', 'value_02': '456', 'datacenter': '2', 'bytes': '15'}

提前致谢。

Answer 1

使用在所有“其他”键上建立索引的中间字典，您可以在公共字典中为其他字段的每个组合累积“字节”值。 然后将索引值转换回字典列表：

lst = [{'value_01': '123', 'value_02': '456', 'datacenter': '1', 'bytes': '25'},
       {'value_01': '123', 'value_02': '456', 'datacenter': '1', 'bytes': '35'},
       {'value_01': '678', 'value_02': '901', 'datacenter': '2', 'bytes': '55'},
       {'value_01': '678', 'value_02': '456', 'datacenter': '2', 'bytes': '15'}]

merged = dict()
for d in lst:
    k = map(d.get,sorted({*d}-{"bytes"}))  # index on all other fields
    m = merged.setdefault(tuple(k),d)      # add/get first instance
    if m is not d:                         # accumulate bytes (as strings) 
        m['bytes'] = str(int(m['bytes']) + int(d['bytes']))
mergedList = list(merged.values())

print(mergedList)
[{'value_01': '123', 'value_02': '456', 'datacenter': '1', 'bytes': '60'},
 {'value_01': '678', 'value_02': '901', 'datacenter': '2', 'bytes': '55'},
 {'value_01': '678', 'value_02': '456', 'datacenter': '2', 'bytes': '15'}]

即使您的数据没有按其他字段的组合进行分组，这也无需排序（即在 O(n) 时间内）即可工作。 如果键的顺序不同，它也可以工作。 缺少键会有问题，但可以使用理解而不是map(d.get, .

请注意，您确实应该将字节数存储为整数而不是字符串

Answer 2

下面的代码应该可以工作

from collections import defaultdict

lst = [{'value_01': '123', 'value_02': '456', 'datacenter': '1', 'bytes': '25'},
       {'value_01': '123', 'value_02': '456', 'datacenter': '1', 'bytes': '35'},
       {'value_01': '678', 'value_02': '901', 'datacenter': '2', 'bytes': '55'},
       {'value_01': '678', 'value_02': '456', 'datacenter': '2', 'bytes': '15'}]
keys = ['value_01', 'value_02', 'datacenter']
data = defaultdict(int)
for entry in lst:
    key = tuple([entry[key] for key in keys])
    data[key] += int(entry['bytes'])
print(data)

output

defaultdict(<class 'int'>, {('123', '456', '1'): 60, ('678', '901', '2'): 55, ('678', '456', '2'): 15})

Python。用 SUM 合并 dict 行

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-02-26 04:54:01

解决方案2
0 2021-02-25 20:19:34

Python。 用 SUM 合并 dict 行

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-02-26 04:54:01

解决方案2 0 2021-02-25 20:19:34

Python。用 SUM 合并 dict 行

解决方案1
1 已采纳 2021-02-26 04:54:01

解决方案2
0 2021-02-25 20:19:34