根据键值聚合列表中的字典

Question

I'm struggling to wrap my head around this one.我正在努力解决这个问题。 I've got a list with multiple dictionaries that I would like to aggregate based on two values.我有一个包含多个字典的列表，我想根据两个值进行聚合。 Example code:示例代码：

>>> data = [
...     { "regex": ".*ccc-r.*", "age": 44, "count": 224 },
...     { "regex": ".*nft-r.*", "age": 23, "count": 44 },
...     { "regex": ".*ccc-r.*", "age": 44, "count": 20 },
...     { "regex": ".*ccc-r.*", "age": 32, "count": 16 },
...     { "regex": ".*nft-r.*", "age": 23, "count": 46 },
...     { "regex": ".*zxy-r.*", "age": 16, "count": 55 }
...     ]

I'm trying to aggregate dicts that have the same age and regex and adding the count key across all instances.我正在尝试聚合具有相同年龄和正则表达式的字典，并在所有实例中添加计数键。 Example output would be:示例 output 将是：

>>> data = [
...     { "regex": ".*ccc-r.*", "age": 44, "count": 244 },
...     { "regex": ".*nft-r.*", "age": 23, "count": 90 },
...     { "regex": ".*ccc-r.*", "age": 32, "count": 16 },
...     { "regex": ".*zxy-r.*", "age": 16, "count": 55 }
...     ]

Would like to do this without pandas or addon modules, would prefer a solution from the std lib if at all possible.想要在没有 pandas 或附加模块的情况下执行此操作，如果可能的话，更喜欢 std lib 中的解决方案。

Thanks!谢谢！

Answer 1

You can use collections.defaultdict :您可以使用collections.defaultdict ：

from collections import defaultdict
d = defaultdict(int)
data = [{'regex': '.*ccc-r.*', 'age': 44, 'count': 224}, {'regex': '.*nft-r.*', 'age': 23, 'count': 44}, {'regex': '.*ccc-r.*', 'age': 44, 'count': 20}, {'regex': '.*ccc-r.*', 'age': 32, 'count': 16}, {'regex': '.*nft-r.*', 'age': 23, 'count': 46}, {'regex': '.*zxy-r.*', 'age': 16, 'count': 55}]
for i in data:
   d[(i['regex'], i['age'])] += i['count']

r = [{'regex':a, 'age':b, 'count':c} for (a, b), c in d.items()]

Output: Output：

[{'regex': '.*ccc-r.*', 'age': 44, 'count': 244}, 
 {'regex': '.*nft-r.*', 'age': 23, 'count': 90}, 
 {'regex': '.*ccc-r.*', 'age': 32, 'count': 16}, 
 {'regex': '.*zxy-r.*', 'age': 16, 'count': 55}]

Answer 2

Assuming you do not want to use any imports, you can first collect the data in a dictionary aggregated_data in which the key will be a tuple of (regex, age) , and the value will be the count .假设您不想使用任何导入，您可以首先将数据收集到字典中的aggregated_data数据中，其中键是(regex, age)的元组，值是count 。 Once you have formed this dictionary, you can form back the original structure you had:一旦你形成了这本字典，你就可以重新形成你原来的结构：

data = [
    { "regex": ".*ccc-r.*", "age": 44, "count": 224 },
    { "regex": ".*nft-r.*", "age": 23, "count": 44 },
    { "regex": ".*ccc-r.*", "age": 44, "count": 20 },
    { "regex": ".*ccc-r.*", "age": 32, "count": 16 },
    { "regex": ".*nft-r.*", "age": 23, "count": 46 },
    { "regex": ".*zxy-r.*", "age": 16, "count": 55 }
]

aggregated_data = {}

for dictionary in data:
    key = (dictionary['regex'], dictionary['age'])
    aggregated_data[key] = aggregated_data.get(key, 0) + dictionary['count']

data = [{'regex': key[0], 'age': key[1], 'count': value} for key, value in aggregated_data.items()]

Answer 3

You can also try,你也可以试试，

agg = {}

for d in data:
    if agg.get(d['regex']):
        agg[d['regex']]['count'] += d['count']
    else:
        agg[d['regex']] = d

print(agg.values())

Answer 4

If you're not opposed to using a library (and a slightly different output) this can be done nicely with pandas如果您不反对使用库（以及稍微不同的输出），则可以使用pandas很好地完成

import pandas as pd

df = pd.DataFrame(data)
data.groupby(['regex', 'age']).sum()

This yields这产生

               count
regex     age
.*ccc-r.* 32      16
          44     244
.*nft-r.* 23      90
.*zxy-r.* 16      55

根据键值聚合列表中的字典

问题描述

4 个解决方案

解决方案1
2 2021-05-20 03:00:11

解决方案2
1 已采纳 2021-05-20 03:01:47

解决方案3
1 2021-05-20 03:02:21

解决方案4
0 2021-05-20 03:03:41

根据键值聚合列表中的字典

问题描述

4 个解决方案

解决方案1 2 2021-05-20 03:00:11

解决方案2 1 已采纳 2021-05-20 03:01:47

解决方案3 1 2021-05-20 03:02:21

解决方案4 0 2021-05-20 03:03:41

解决方案1
2 2021-05-20 03:00:11

解决方案2
1 已采纳 2021-05-20 03:01:47

解决方案3
1 2021-05-20 03:02:21

解决方案4
0 2021-05-20 03:03:41