[英]Calculate rate of positive values by group
I'm working with a Pandas DataFrame having the following structure:我正在使用具有以下结构的 Pandas DataFrame :
import pandas as pd
df = pd.DataFrame({'brand' : ['A', 'A', 'B', 'B', 'C', 'C'],
'target' : [0, 1, 0, 1, 0, 1],
'freq' : [5600, 220, 5700, 90, 5000, 100]})
print(df)
brand target freq
0 A 0 5600
1 A 1 220
2 B 0 5700
3 B 1 90
4 C 0 5000
5 C 1 100
For each brand, I would like to calculate the ratio of positive targets, eg for brand A, the percentage of positive target is 220/(220+5600) = 0.0378.对于每个品牌,我想计算正面目标的比率,例如对于品牌 A,正面目标的百分比是 220/(220+5600) = 0.0378。
My resulting DataFrame should look like the following:我生成的 DataFrame 应该如下所示:
brand target freq ratio
0 A 0 5600 0.0378
1 A 1 220 0.0378
2 B 0 5700 0.0156
3 B 1 90 0.0156
4 C 0 5000 0.0196
5 C 1 100 0.0196
I know that I should group my DataFrame by brand and then apply some function to each group (since I want to keep all rows in my final result I think I should use transform here).我知道我应该按品牌对我的 DataFrame 进行分组,然后将一些 function 应用于每个组(因为我想在最终结果中保留所有行,所以我认为我应该在这里使用变换)。 I tested a couple of things but without any success.
我测试了几件事,但没有任何成功。 Any help is appreciated.
任何帮助表示赞赏。
First sorting columns by brand
and target
for last 1
row per group and then divide in GroupBy.transform
with lambda function:首先按
brand
和target
对每组最后1
行的列进行排序,然后使用 lambda function 在GroupBy.transform
中划分:
df = df.sort_values(['brand','target'])
df['ratio'] = df.groupby('brand')['freq'].transform(lambda x: x.iat[-1] / x.sum())
print (df)
brand target freq ratio
0 A 0 5600 0.037801
1 A 1 220 0.037801
2 B 0 5700 0.015544
3 B 1 90 0.015544
4 C 0 5000 0.019608
5 C 1 100 0.019608
Or divide Series created by functions GroupBy.last
and GroupBy.sum
:或划分由函数
GroupBy.last
和GroupBy.sum
创建的系列:
df = df.sort_values(['brand','target'])
g = df.groupby('brand')['freq']
df['ratio'] = g.transform('last').div(g.transform('sum'))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.