Pandas groupby 并计算多列中具有 NA 的值的比率

Question

I have a dataframe like as below我有一个如下所示的数据框

id,status,amount,qty
1,pass,123,4500
1,pass,156,3210
1,fail,687,2137
1,fail,456,1236
2,pass,216,324
2,pass,678,241
2,nan,637,213
2,pass,213,543

df = pd.read_clipboard(sep=',')

I would like to do the below我想做以下

a) Groupby id and compute the pass percentage for each id a) Groupby id并计算每个 id 的通过率

b) Groupby id and compute the average amount for each id b) Groupby id并计算每个 id 的平均amount

So, I tried the below所以，我尝试了以下

df['amt_avg'] = df.groupby('id')['amount'].mean()
df['pass_pct'] = df.groupby('status').apply(lambda x: x['status']/ x['status'].count())
df['fail_pct'] = df.groupby('status').apply(lambda x: x['status']/ x['status'].count())

but this doesn't work.但这不起作用。

I am having trouble in getting the pass percentage.我很难获得通过率。

In my real data I have lot of columns like status for which I have to find these % distribution of a specific value (ex: pass)在我的真实数据中，我有很多列，例如status ，我必须找到这些特定值的百分比分布（例如：通过）

I expect my output to be like as below我希望我的输出如下

id,pass_pct,fail_pct,amt_avg
1,50,50,2770.75
2,75,0,330.25

Answer 1

Use crosstab with replace missing values by nan with remove nan column and then add new column amt_avg by DataFrame.join :使用crosstab ，用nan替换缺失值并删除nan列，然后通过DataFrame.join amt_avg

s = df.groupby('id')['qty'].mean()

df  = (pd.crosstab(df['id'], df['status'].fillna('nan'), normalize=0)
          .drop('nan', 1)
          .mul(100)
          .join(s.rename('amt_avg')))

print (df)
    fail  pass  amt_avg
id                     
1   50.0  50.0  2770.75
2    0.0  75.0   330.25

Pandas groupby 并计算多列中具有 NA 的值的比率

问题描述

1 个解决方案

解决方案1
3 已采纳 2022-05-25 06:31:18

Pandas groupby 并计算多列中具有 NA 的值的比率

问题描述

1 个解决方案

解决方案1 3 已采纳 2022-05-25 06:31:18

解决方案1
3 已采纳 2022-05-25 06:31:18