[英]Pandas groupby and compute ratio of values with NA in multiple columns
I have a dataframe like as below我有一个如下所示的数据框
id,status,amount,qty
1,pass,123,4500
1,pass,156,3210
1,fail,687,2137
1,fail,456,1236
2,pass,216,324
2,pass,678,241
2,nan,637,213
2,pass,213,543
df = pd.read_clipboard(sep=',')
I would like to do the below我想做以下
a) Groupby id
and compute the pass percentage for each id a) Groupby
id
并计算每个 id 的通过率
b) Groupby id
and compute the average amount
for each id b) Groupby
id
并计算每个 id 的平均amount
So, I tried the below所以,我尝试了以下
df['amt_avg'] = df.groupby('id')['amount'].mean()
df['pass_pct'] = df.groupby('status').apply(lambda x: x['status']/ x['status'].count())
df['fail_pct'] = df.groupby('status').apply(lambda x: x['status']/ x['status'].count())
but this doesn't work.但这不起作用。
I am having trouble in getting the pass percentage.我很难获得通过率。
In my real data I have lot of columns like status
for which I have to find these % distribution of a specific value (ex: pass)在我的真实数据中,我有很多列,例如
status
,我必须找到这些特定值的百分比分布(例如:通过)
I expect my output to be like as below我希望我的输出如下
id,pass_pct,fail_pct,amt_avg
1,50,50,2770.75
2,75,0,330.25
Use crosstab
with replace missing values by nan
with remove nan
column and then add new column amt_avg
by DataFrame.join
:使用
crosstab
,用nan
替换缺失值并删除nan
列,然后通过DataFrame.join
amt_avg
s = df.groupby('id')['qty'].mean()
df = (pd.crosstab(df['id'], df['status'].fillna('nan'), normalize=0)
.drop('nan', 1)
.mul(100)
.join(s.rename('amt_avg')))
print (df)
fail pass amt_avg
id
1 50.0 50.0 2770.75
2 0.0 75.0 330.25
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.