简体   繁体   English

Pandas groupby 并计算多列中具有 NA 的值的比率

[英]Pandas groupby and compute ratio of values with NA in multiple columns

I have a dataframe like as below我有一个如下所示的数据框

id,status,amount,qty
1,pass,123,4500
1,pass,156,3210
1,fail,687,2137
1,fail,456,1236
2,pass,216,324
2,pass,678,241
2,nan,637,213
2,pass,213,543

df = pd.read_clipboard(sep=',')

I would like to do the below我想做以下

a) Groupby id and compute the pass percentage for each id a) Groupby id并计算每个 id 的通过率

b) Groupby id and compute the average amount for each id b) Groupby id并计算每个 id 的平均amount

So, I tried the below所以,我尝试了以下

df['amt_avg'] = df.groupby('id')['amount'].mean()
df['pass_pct'] = df.groupby('status').apply(lambda x: x['status']/ x['status'].count())
df['fail_pct'] = df.groupby('status').apply(lambda x: x['status']/ x['status'].count())

but this doesn't work.但这不起作用。

I am having trouble in getting the pass percentage.我很难获得通过率。

In my real data I have lot of columns like status for which I have to find these % distribution of a specific value (ex: pass)在我的真实数据中,我有很多列,例如status ,我必须找到这些特定值的百分比分布(例如:通过)

I expect my output to be like as below我希望我的输出如下

id,pass_pct,fail_pct,amt_avg
1,50,50,2770.75
2,75,0,330.25

Use crosstab with replace missing values by nan with remove nan column and then add new column amt_avg by DataFrame.join :使用crosstab ,用nan替换缺失值并删除nan列,然后通过DataFrame.join amt_avg

s = df.groupby('id')['qty'].mean()

df  = (pd.crosstab(df['id'], df['status'].fillna('nan'), normalize=0)
          .drop('nan', 1)
          .mul(100)
          .join(s.rename('amt_avg')))

print (df)
    fail  pass  amt_avg
id                     
1   50.0  50.0  2770.75
2    0.0  75.0   330.25

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM