[英]how to apply a user defined function column wise on grouped data in pandas
How to apply a user defined function column wise on grouped data in pandas.如何将用户定义的 function 列应用于 pandas 中的分组数据。 The user defined function returns a series of fixed shape.
用户定义的 function 返回一系列固定的形状。
def getStats(col):
names = ['mean', 'std']
return pd.Series([np.mean(col), np.std(col)], index = names, name = col.name)
df = pd.DataFrame({'city':['c1','c2','c1','c2'],
'age':[10,20,30,40],
'sal':[1000,2000,3000,4000]})
grp_data = df.groupby('city')
grp_data.apply(getStats)
I have tried above snippet.我已经尝试过上面的片段。 But I am not getting the result in expected format.
但我没有得到预期格式的结果。
c1 | c1 | mean |
意思| x |
x | y
是的
c2 | c2 | std |
标准 | x1 |
x1 | y1
y1
Could you pls help on this.你能帮忙吗?
I think custom function here is not necessary, rather aggregate by GroupBy.agg
with list of aggregate functions and reshape by DataFrame.stack
, last DataFrame.rename_axis
is for city
and level
labels:我认为这里没有必要自定义
level
,而是通过GroupBy.agg
聚合函数列表并通过DataFrame.stack
重塑,最后一个DataFrame.rename_axis
是city
名称轴和标签:rename。
df = df.groupby('city').agg([np.mean,np.std]).stack().rename_axis(['city','level'])
print (df)
age sal
city level
c1 mean 20.000000 2000.000000
std 14.142136 1414.213562
c2 mean 30.000000 3000.000000
std 14.142136 1414.213562
def q(c):
def f1(x):
return x.quantile(c)
f1.__name__ = f'q{c}'
return f1
df = (df.groupby('city')
.agg([np.mean,np.std, q(0.25), q(0.75)])
.stack()
.rename_axis(['city','level']))
print (df)
age sal
city level
c1 mean 20.000000 2000.000000
std 14.142136 1414.213562
q0.25 15.000000 1500.000000
q0.75 25.000000 2500.000000
c2 mean 30.000000 3000.000000
std 14.142136 1414.213562
q0.25 25.000000 2500.000000
q0.75 35.000000 3500.000000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.