繁体   English   中英

Groupby应用多个参数

[英]Groupby apply with multiple arguments

我想根据它们所属的组来计算百分等级。 我编写了以下代码,并能够进行计算,例如zscore,因为只有一个输入。 有两个参数的函数该怎么办? 谢谢。

import pandas as pd
import scipy.stats as stats
import numpy as np

funZScore = lambda x: (x - x.mean()) / x.std()
funPercentile = lambda x, y: stats.percentileofscore(x[~np.isnan(x)], y)

A = pd.DataFrame({'Group' : ['A','A','A','A','B','B','B'], 
                  'Value' : [4, 7, None, 6, 2, 8, 1]})

# Compute the Z-score by group
A['Z'] = A.groupby('Group')['Value'].apply(funZScore)

print(A)
Group  Value         Z
0     A    4.0 -1.091089
1     A    7.0  0.872872
2     A    NaN       NaN
3     A    6.0  0.218218
4     B    2.0 -0.440225
5     B    8.0  1.144586
6     B    1.0 -0.704361

# compute the percentile rank by group
# how to put two arguments into groupby apply? 
# I hope to get something like below
Group  Value         Z    P
0     A    4.0 -1.091089    33.33
1     A    7.0  0.872872   100 
2     A    NaN       NaN   NaN
3     A    6.0  0.218218   66.67
4     B    2.0 -0.440225   66.67
5     B    8.0  1.144586   100
6     B    1.0 -0.704361   33.33

我认为需要:

d = A.groupby('Group')['Value'].apply(list).to_dict()
print (d)
{'A': [4.0, 7.0, nan, 6.0], 'B': [2.0, 8.0, 1.0]}


A['P'] = A.apply(lambda x: funPercentile(np.array(d[x['Group']]), x['Value']), axis=1)
print (A)
  Group  Value         Z           P
0     A    4.0 -1.091089   33.333333
1     A    7.0  0.872872  100.000000
2     C    NaN       NaN         NaN
3     A    6.0  0.218218   66.666667
4     B    2.0 -0.440225   66.666667
5     B    8.0  1.144586  100.000000
6     B    1.0 -0.704361   33.333333

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM