在 agg 函数中聚合具有一个属性的多列

Question

Let's suppose that I have a pandas dataFrame ( data_stores ) similar to the following:假设我有一个类似于以下内容的 Pandas dataFrame ( data_stores )：

store| item1 | item2 | item3
------------------------------
1    | 45    | 50    | 53  
1    | 200   | 300   | 250
2    | 20    | 17    | 21  
2    | 300   | 350   | 400

Let's say that I want to aggregate on column item1 with the mean and on columns item2 and item3 with the sum .假设我想在列item1上聚合mean ，在列item2和item3上item2 sum 。

This could be commonly done in the following way:这通常可以通过以下方式完成：

data_stores_total= data_stores.groupby(['store'], as_index=False).agg({'item1': 'mean', 'item2': 'sum', 'item3': 'sum' })

However, this cannot be done (more efficiently) in the following way:但是，这不能通过以下方式（更有效地）完成：

 data_stores_total= data_stores.groupby(['store'], as_index=False).agg({'item1': 'mean', ['item2', 'item3']: 'sum' })

neither in the following way which makes more sense for dictionary keys:既不是以下对字典键更有意义的方式：

 data_stores_total= data_stores.groupby(['store'], as_index=False).agg({'mean': 'item1':, 'sum': ['item2', 'item3']})

Is there any way to do an aggregation with the same function on some columns of a dataframe without writing a new dictionary attribute at the agg function for each of them?有没有办法在数据帧的某些列上使用相同的函数进行聚合，而无需在agg函数中为每个列编写新的字典属性？

Answer 1

It is not possible, only you can define dictionary with keys for functions and list for columns names, and then swap keys with values in loop:这是不可能的，只有你可以用函数的键和列名的列表定义字典，然后在循环中用值交换键：

data_stores = pd.DataFrame({'store': [1, 1, 2, 2], 
                           'item1': [45, 200, 20, 300], 
                           'item2': [50, 300, 17, 350], 
                           'item3': [53, 250, 21, 400]})
print (data_stores)
   store  item1  item2  item3
0      1     45     50     53
1      1    200    300    250
2      2     20     17     21
3      2    300    350    400


d = {'mean':'item1', 'sum' : ['item2', 'item3']}

out = {}
for k, v in d.items():
    if isinstance(v, list):
        for x in v:
            out[x] = k
    else:
        out[v] = k

print (out)
{'item1': 'mean', 'item2': 'sum', 'item3': 'sum'}

data_stores_total = data_stores.groupby('store', as_index=False).agg(out)
print (data_stores_total)
   store  item1  item2  item3
0      1  122.5    350    303
1      2  160.0    367    421

Or:或者：

d = {'mean':['item1'], 'sum' : ['item2', 'item3']}

d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)
{'item1': 'mean', 'item2': 'sum', 'item3': 'sum'}

data_stores_total = data_stores.groupby('store', as_index=False).agg(d1)
print (data_stores_total)
   store  item1  item2  item3
0      1  122.5    350    303
1      2  160.0    367    421

EDIT:编辑：

If want aggregate all columns without few by same aggregate function, you can create dictionary by all columns with filter out by list with difference and then add missing pairs key: value for column: aggregate function:如果想通过相同的聚合函数聚合所有列而没有少数列，您可以通过所有列创建字典，并通过列表过滤出difference ，然后添加缺失对键：列值：聚合函数：

out = dict.fromkeys(data_stores.columns.difference(['store','item1']), 'sum')
out['item1'] = 'mean'
print (out)
{'item2': 'sum', 'item3': 'sum', 'item1': 'mean'}

data_stores_total = data_stores.groupby('store', as_index=False).agg(out)
print (data_stores_total)
   store  item2  item3  item1
0      1    350    303  122.5
1      2    367    421  160.0

You can also pass custom function working with this column:您还可以传递使用此列的自定义函数：

def func(x):
    return x.sum() / x.mean()

out = dict.fromkeys(data_stores.columns.difference(['store','item1']), 'sum')
out['item1'] = func
print (out)
{'item2': 'sum', 'item3': 'sum', 'item1': <function func at 0x000000000F3950D0>}

data_stores_total = data_stores.groupby('store', as_index=False).agg(out)
print (data_stores_total)
   store  item2  item3  item1
0      1    350    303      2
1      2    367    421      2

在 agg 函数中聚合具有一个属性的多列

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-01-16 13:09:30

在 agg 函数中聚合具有一个属性的多列

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-01-16 13:09:30

解决方案1
2 已采纳 2019-01-16 13:09:30