[英]Aggregate on multiple columns with one attribute at the agg function
Let's suppose that I have a pandas dataFrame ( data_stores
) similar to the following:假设我有一个类似于以下内容的 Pandas dataFrame (
data_stores
):
store| item1 | item2 | item3
------------------------------
1 | 45 | 50 | 53
1 | 200 | 300 | 250
2 | 20 | 17 | 21
2 | 300 | 350 | 400
Let's say that I want to aggregate on column item1
with the mean
and on columns item2
and item3
with the sum
.假设我想在列
item1
上聚合mean
,在列item2
和item3
上item2
sum
。
This could be commonly done in the following way:这通常可以通过以下方式完成:
data_stores_total= data_stores.groupby(['store'], as_index=False).agg({'item1': 'mean', 'item2': 'sum', 'item3': 'sum' })
However, this cannot be done (more efficiently) in the following way:但是,这不能通过以下方式(更有效地)完成:
data_stores_total= data_stores.groupby(['store'], as_index=False).agg({'item1': 'mean', ['item2', 'item3']: 'sum' })
neither in the following way which makes more sense for dictionary keys:既不是以下对字典键更有意义的方式:
data_stores_total= data_stores.groupby(['store'], as_index=False).agg({'mean': 'item1':, 'sum': ['item2', 'item3']})
Is there any way to do an aggregation with the same function on some columns of a dataframe without writing a new dictionary attribute at the agg
function for each of them?有没有办法在数据帧的某些列上使用相同的函数进行聚合,而无需在
agg
函数中为每个列编写新的字典属性?
It is not possible, only you can define dictionary with keys for functions and list for columns names, and then swap keys with values in loop:这是不可能的,只有你可以用函数的键和列名的列表定义字典,然后在循环中用值交换键:
data_stores = pd.DataFrame({'store': [1, 1, 2, 2],
'item1': [45, 200, 20, 300],
'item2': [50, 300, 17, 350],
'item3': [53, 250, 21, 400]})
print (data_stores)
store item1 item2 item3
0 1 45 50 53
1 1 200 300 250
2 2 20 17 21
3 2 300 350 400
d = {'mean':'item1', 'sum' : ['item2', 'item3']}
out = {}
for k, v in d.items():
if isinstance(v, list):
for x in v:
out[x] = k
else:
out[v] = k
print (out)
{'item1': 'mean', 'item2': 'sum', 'item3': 'sum'}
data_stores_total = data_stores.groupby('store', as_index=False).agg(out)
print (data_stores_total)
store item1 item2 item3
0 1 122.5 350 303
1 2 160.0 367 421
Or:或者:
d = {'mean':['item1'], 'sum' : ['item2', 'item3']}
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)
{'item1': 'mean', 'item2': 'sum', 'item3': 'sum'}
data_stores_total = data_stores.groupby('store', as_index=False).agg(d1)
print (data_stores_total)
store item1 item2 item3
0 1 122.5 350 303
1 2 160.0 367 421
EDIT:编辑:
If want aggregate all columns without few by same aggregate function, you can create dictionary by all columns with filter out by list with difference
and then add missing pairs key: value for column: aggregate function:如果想通过相同的聚合函数聚合所有列而没有少数列,您可以通过所有列创建字典,并通过列表过滤出
difference
,然后添加缺失对键:列值:聚合函数:
out = dict.fromkeys(data_stores.columns.difference(['store','item1']), 'sum')
out['item1'] = 'mean'
print (out)
{'item2': 'sum', 'item3': 'sum', 'item1': 'mean'}
data_stores_total = data_stores.groupby('store', as_index=False).agg(out)
print (data_stores_total)
store item2 item3 item1
0 1 350 303 122.5
1 2 367 421 160.0
You can also pass custom function working with this column:您还可以传递使用此列的自定义函数:
def func(x):
return x.sum() / x.mean()
out = dict.fromkeys(data_stores.columns.difference(['store','item1']), 'sum')
out['item1'] = func
print (out)
{'item2': 'sum', 'item3': 'sum', 'item1': <function func at 0x000000000F3950D0>}
data_stores_total = data_stores.groupby('store', as_index=False).agg(out)
print (data_stores_total)
store item2 item3 item1
0 1 350 303 2
1 2 367 421 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.