按组计算正值的比率

Question

I'm working with a Pandas DataFrame having the following structure:我正在使用具有以下结构的 Pandas DataFrame ：

import pandas as pd

df = pd.DataFrame({'brand' : ['A', 'A', 'B', 'B', 'C', 'C'], 
                   'target' : [0, 1, 0, 1, 0, 1], 
                   'freq' : [5600, 220, 5700, 90, 5000, 100]})

print(df)
  brand  target  freq
0     A       0  5600
1     A       1   220
2     B       0  5700
3     B       1    90
4     C       0  5000
5     C       1   100

For each brand, I would like to calculate the ratio of positive targets, eg for brand A, the percentage of positive target is 220/(220+5600) = 0.0378.对于每个品牌，我想计算正面目标的比率，例如对于品牌 A，正面目标的百分比是 220/(220+5600) = 0.0378。

My resulting DataFrame should look like the following:我生成的 DataFrame 应该如下所示：

  brand  target  freq   ratio
0     A       0  5600  0.0378
1     A       1   220  0.0378
2     B       0  5700  0.0156
3     B       1    90  0.0156
4     C       0  5000  0.0196
5     C       1   100  0.0196

I know that I should group my DataFrame by brand and then apply some function to each group (since I want to keep all rows in my final result I think I should use transform here).我知道我应该按品牌对我的 DataFrame 进行分组，然后将一些 function 应用于每个组（因为我想在最终结果中保留所有行，所以我认为我应该在这里使用变换）。 I tested a couple of things but without any success.我测试了几件事，但没有任何成功。 Any help is appreciated.任何帮助表示赞赏。

Answer 1

First sorting columns by brand and target for last 1 row per group and then divide in GroupBy.transform with lambda function:首先按brand和target对每组最后1行的列进行排序，然后使用 lambda function 在GroupBy.transform中划分：

df = df.sort_values(['brand','target'])
df['ratio'] = df.groupby('brand')['freq'].transform(lambda x: x.iat[-1] / x.sum())
print (df)
  brand  target  freq     ratio
0     A       0  5600  0.037801
1     A       1   220  0.037801
2     B       0  5700  0.015544
3     B       1    90  0.015544
4     C       0  5000  0.019608
5     C       1   100  0.019608

Or divide Series created by functions GroupBy.last and GroupBy.sum :或划分由函数GroupBy.last和GroupBy.sum创建的系列：

df = df.sort_values(['brand','target'])
g = df.groupby('brand')['freq']
df['ratio'] = g.transform('last').div(g.transform('sum'))

按组计算正值的比率

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-04-08 10:53:24

按组计算正值的比率

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-04-08 10:53:24

解决方案1
2 已采纳 2020-04-08 10:53:24