[英]How to use groupby to apply multiple functions to multiple columns in Pandas?
I have a normal df 我有正常的df
A = pd.DataFrame([[1, 5, 2], [2, 4, 4], [3, 3, 1], [4, 2, 2], [5, 1, 4]],
columns=['A', 'B', 'C'], index=[1, 2, 3, 4, 5])
Following this recipe , I got the the results I wanted. 按照这个方法 ,我得到了我想要的结果。
In [62]: A.groupby((A['A'] > 2)).apply(lambda x: pd.Series(dict(
up_B=(x.B >= 0).sum(), down_B=(x.B < 0).sum(), mean_B=(x.B).mean(), std_B=(x.B).std(),
up_C=(x.C >= 0).sum(), down_C=(x.C < 0).sum(), mean_C=(x.C).mean(), std_C=(x.C).std())))
Out[62]:
down_B down_C mean_B mean_C std_B std_C up_B up_C
A
False 0 0 4.5 3.000000 0.707107 1.414214 2 2
True 0 0 2.0 2.333333 1.000000 1.527525 3 3
This approach is fine, but imagine you had to do this for a large number of columns (15-100), then you have to type all that stuff in the formula, which can be cumbersome. 这种方法很好,但想象你必须为大量的列(15-100)做这个,然后你必须在公式中键入所有这些东西,这可能很麻烦。
Given that the same formulas are applied to ALL columns. 鉴于相同的公式适用于所有列。 Is there an efficient way to do this for a large number of columns?.
有没有一种有效的方法来为大量的列做到这一点?
Thanks 谢谢
Since you are aggregating each grouped column into one value, you can use agg
instead of apply
. 由于您将每个分组列聚合为一个值,因此可以使用
agg
而不是apply
。 The agg
method can take a list of functions as input. agg
方法可以将函数列表作为输入。 The functions will be applied to each column : 这些函数将应用于每一列 :
def up(x):
return (x >= 0).sum()
def down(x):
return (x < 0).sum()
result = A.loc[:, 'B':'C'].groupby((A['A'] > 2)).agg(
[up, down, 'mean', 'std'])
print(result)
yields 产量
B C
up down mean std up down mean std
A
False 2 0 4.5 0.707107 2 0 3.000000 1.414214
True 3 0 2.0 1.000000 3 0 2.333333 1.527525
result
has hierarchical ("MultiIndexed") columns. result
具有分层(“MultiIndexed”)列。 To select a certain column (or columns), you could use: 要选择某个列(或列),您可以使用:
In [39]: result['B','mean']
Out[39]:
A
False 4.5
True 2.0
Name: (B, mean), dtype: float64
In [46]: result[[('B', 'mean'), ('C', 'mean')]]
Out[46]:
B C
mean mean
A
False 4.5 3.000000
True 2.0 2.333333
or you could move one level of the MultiIndex to the index: 或者您可以将MultiIndex的一个级别移动到索引:
In [40]: result.stack()
Out[40]:
B C
A
False up 2.000000 2.000000
down 0.000000 0.000000
mean 4.500000 3.000000
std 0.707107 1.414214
True up 3.000000 3.000000
down 0.000000 0.000000
mean 2.000000 2.333333
std 1.000000 1.527525
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.