[英]Pandas transform columns into percentage by group
I created a data frame below:我在下面创建了一个数据框:
gender_mix = pd.DataFrame({
'user': df.user_type,
'generation': df.generation,
'gender': df.gender,
'record': 1
})\
.groupby(by=['user', 'generation', 'gender'], as_index=False).agg({'record': np.sum})\
.reset_index(drop=True)
user generation gender record
0 Customer baby_boomer Female 19458
1 Customer baby_boomer Male 37510
2 Customer gen_x Female 75333
3 Customer gen_x Male 157443
4 Customer gen_y Female 340061
5 Customer gen_y Male 607945
6 Customer gen_z Female 44980
7 Customer gen_z Male 93751
8 Customer silent Female 159
9 Customer silent Male 608
10 Subscriber baby_boomer Female 530056
11 Subscriber baby_boomer Male 1695197
12 Subscriber gen_x Female 1119945
13 Subscriber gen_x Male 3811786
14 Subscriber gen_y Female 2319716
15 Subscriber gen_y Male 6304151
16 Subscriber gen_z Female 74390
17 Subscriber gen_z Male 284011
18 Subscriber silent Female 20133
19 Subscriber silent Male 59013
I would like to calculate % of record by gender.我想按性别计算记录的百分比。 For example:例如:
user: Customer > generation: baby_boomer > gender: Female 19,458 & Male 37,510.用户:客户 > 世代:婴儿潮一代 > 性别:女性 19,458 和男性 37,510。 Female is 34% & Male is 66% after rounding for this group of user & generation.在对这组用户和一代进行四舍五入后,女性为 34%,男性为 66%。
Below is my current solution:以下是我目前的解决方案:
# create a new data frame which calculate total record by group of user & generation
t = gender_mix.groupby(by=['user', 'generation']).sum()\
.reset_index()\
.rename(columns={'record': 'total_by_gen'})
# merge original data frame & 't'
# calculate new variable 'percent' by dividing 'record' with 'total_by_gen'
gender_mix = pd.merge(left=gender_mix, right=t, on=['user', 'generation'])\
.assign(percent = lambda data: data.record * 100 / data.total_by_gen)\
.assign(percent = lambda data: data.percent.round().astype('int'))
Here is part of the new data frame.这是新数据框的一部分。
user generation gender record total_by_gen percent
0 Customer baby_boomer Female 19458 56968 34
1 Customer baby_boomer Male 37510 56968 66
2 Customer gen_x Female 75333 232776 32
3 Customer gen_x Male 157443 232776 68
4 Customer gen_y Female 340061 948006 36
5 Customer gen_y Male 607945 948006 64
6 Customer gen_z Female 44980 138731 32
7 Customer gen_z Male 93751 138731 68
8 Customer silent Female 159 767 21
9 Customer silent Male 608 767 79
I wonder if there is a way to convert the 'record' column in original data frame to 'percentage by gender' by applying a function?我想知道是否有办法通过应用 function 将原始数据框中的“记录”列转换为“按性别划分的百分比”?
You can use transform
after the groupby
and assign the results directly to the column 'record'
:您可以在groupby
之后使用transform
并将结果直接分配给列'record'
:
gender_mix['record'] = gender_mix\
.groupby(['user', 'generation'])['record']\
.transform(lambda x: round((x/sum(x)*100)).astype(int))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.