Python中的Group By和Aggregate Dictionaries列表

Question

I have a list of dictionaries which I need to aggregate in Python: 我有一个字典列表，我需要在Python中聚合：

data = [{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 10}, 
{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 50}, 
{"startDate": 456, "endDate": 789, "campaignName": "def", "campaignCfid": 123, "budgetImpressions": 80}]

and I'm looking to aggregate based on budgetImpressions. 我希望根据budgetImpressions进行汇总。

So the final result should be: 所以最终的结果应该是：

data = [{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 60}, 
{"startDate": 456, "endDate": 789, "campaignName": "def", "campaignCfid": 123, "budgetImpressions": 80}]

Note every entry with a certain campaignName will always have the same corresponding campaignCfid, startDate and endDate. 请注意，具有特定campaignName的每个条目将始终具有相同的campaignCfid，startDate和endDate。

Can this be done in Python? 这可以用Python完成吗？ I've tried using itertools without much success. 我尝试过使用itertools但没有取得多大成功。 Would it be a better approach to use Pandas? 使用熊猫会更好吗？

Answer 1

Just to demonstrate that sometimes python is perfectly fine to do this kind of stuff in: 只是为了证明有时python完全可以做到这样的东西：

In [11]: from collections import Counter
         from itertools import groupby

In [12]: data = [{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 10}, {"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 50}, {"startDate": 456, "endDate": 789, "campaignName": "def", "campaignCfid": 123, "budgetImpressions": 80}]

In [13]: g = groupby(data, lambda x: x.pop('campaignName'))

In [14]: d = {}
         for campaign, campaign_data in g:
             c = Counter()
             for row in campaign_data: c.update(row)
             d[campaign] = c  # if you want a dict rather than Counter, return dict(c) here

In [15]: d
Out[15]:
{'abc': Counter({'campaignCfid': 1578, 'endDate': 912, 'startDate': 246, 'budgetImpressions': 60}),
 'def': Counter({'endDate': 789, 'startDate': 456, 'campaignCfid': 123, 'budgetImpressions': 80})}

If you already have this collection of lists/dicts, it doesn't really make sense to promote this to a DataFrame, it's often cheaper to stay in pure python. 如果你已经有了这个列表/ dicts的集合，那么将它推广到DataFrame真的没有意义，保持纯python通常会更便宜。

Answer 2

Yes, use pandas. 是的，请使用熊猫。 It's great. 这很棒。 You can use the groupby functionality and aggregate by sums, then convert the output to a list of dicts if that is exactly what you want. 您可以使用groupby功能并按总和聚合，然后将输出转换为dicts列表（如果这正是您想要的）。

import pandas as pd

data = [{"startDate": 123, "endDate": 456, "campaignName": 'abc',
         "campaignCfid": 789, "budgetImpressions": 10},
        {"startDate": 123, "endDate": 456, "campaignName": 'abc',
         "campaignCfid": 789, "budgetImpressions": 50},
        {"startDate": 456, "endDate": 789, "campaignName": 'def',
         "campaignCfid": 123, "budgetImpressions": 80}]

df = pd.DataFrame(data)

grouped = df.groupby(['startDate', 'endDate', 'campaignCfid',
                      'campaignName']).agg(sum)

print grouped.reset_index().to_dict('records')

This prints: 这打印：

[{'startDate': 123L, 'campaignCfid': 789L, 'endDate': 456L, 'budgetImpressions': 60L, 'campaignName': 'abc'}, {'startDate': 456L, 'campaignCfid': 123L, 'endDate': 789L, 'budgetImpressions': 80L, 'campaignName': 'def'}]

Python中的Group By和Aggregate Dictionaries列表

问题描述

2 个解决方案

解决方案1
4 2014-06-13 04:35:06

解决方案2
0 已采纳 2014-06-13 00:43:58

Python中的Group By和Aggregate Dictionaries列表

问题描述

2 个解决方案

解决方案1 4 2014-06-13 04:35:06

解决方案2 0 已采纳 2014-06-13 00:43:58

解决方案1
4 2014-06-13 04:35:06

解决方案2
0 已采纳 2014-06-13 00:43:58