计算 Pandas groupby 中的频率

Question

I have a dataframe that looks like this:我有一个看起来像这样的数据框：

     a     b     c        result
0    80    50    10000    pass
1    80    50    10000    pass
2    100   50    10000    pass
3    100   50    10000    fail
...
XX   110   70    15000    pass
XX   110   70    15000    pass
XX   110   80    10000    fail
XX   110   80    10000    fail

I want to get the 'pass'-frequency (in %) of each combination (a, b, c) of the dataframe.我想获得数据帧的每个组合（a、b、c）的“通过”频率（以 % 为单位）。 For example the above dataset should result in例如上面的数据集应该导致

     a     b     c        passFreq
0    80    50    10000    1.0
1    100   50    10000    0.5
...
2    110   70    15000    1.0
3    110   80    10000    0.0

If I do如果我做

df.groupby(['a', 'b', 'c']).describe()

I get the frequencies but it does not report it back the way I want it and I'm not sure how to retrieve the frequencies and create a new dataset from it.我得到了频率，但它没有按照我想要的方式报告它，我不确定如何检索频率并从中创建一个新的数据集。

Any guidance?任何指导？

Answer 1

Use crosstab if need percentages for all values of column result :如果需要列result所有值的百分比，请使用crosstab ：

print (pd.crosstab([df['a'], df['b'], df['c']], df['result'], normalize=0))
result        fail  pass
a   b  c                
80  50 10000   0.0   1.0
100 50 10000   0.5   0.5
110 70 15000   0.0   1.0
    80 10000   1.0   0.0

df2 = (pd.crosstab([df['a'], df['b'], df['c']], 
                  df['result'], normalize=0)
        .reset_index()
        .rename_axis(None, axis=1))
print (df2)
     a   b      c  fail  pass
0   80  50  10000   0.0   1.0
1  100  50  10000   0.5   0.5
2  110  70  15000   0.0   1.0
3  110  80  10000   1.0   0.0

If need only pass first compare values to new column and then aggregate mean :如果只需要首先将比较值pass给新列，然后聚合mean ：

df1 = (df.assign(new = df['result'].eq('pass'))
         .groupby(['a', 'b', 'c'])['new']
         .mean()
         .reset_index(name='pass'))
print (df1)
     a   b      c  pass
0   80  50  10000   1.0
1  100  50  10000   0.5
2  110  70  15000   1.0
3  110  80  10000   0.0

Answer 2

df.groupby(['a', 'b', 'c'])['result'].mean()

Answer 3

You need to select the column pass and then apply .mean() and .reset_index(drop=True) to reset index:您需要选择列pass ，然后应用.reset_index(drop=True) .mean()和.reset_index(drop=True)来重置索引：

df.groupby(['a', 'b', 'c'])['result'].mean().reset_index(drop=True)

If you need .describe you can do that too如果你需要.describe你也可以这样做

计算 Pandas groupby 中的频率

问题描述

3 个解决方案

解决方案1
2 已采纳 2020-11-09 09:16:19

解决方案2
0 2020-11-09 09:15:49

解决方案3
0 2020-11-09 09:17:26

计算 Pandas groupby 中的频率

问题描述

3 个解决方案

解决方案1 2 已采纳 2020-11-09 09:16:19

解决方案2 0 2020-11-09 09:15:49

解决方案3 0 2020-11-09 09:17:26

解决方案1
2 已采纳 2020-11-09 09:16:19

解决方案2
0 2020-11-09 09:15:49

解决方案3
0 2020-11-09 09:17:26