简体   繁体   English

Pandas 按组将列转换为百分比

[英]Pandas transform columns into percentage by group

I created a data frame below:我在下面创建了一个数据框:

gender_mix = pd.DataFrame({
    'user': df.user_type,
    'generation': df.generation,
    'gender': df.gender,
    'record': 1
})\
.groupby(by=['user', 'generation', 'gender'], as_index=False).agg({'record': np.sum})\
.reset_index(drop=True)

        user    generation  gender  record
0   Customer    baby_boomer Female  19458
1   Customer    baby_boomer Male    37510
2   Customer    gen_x   Female  75333
3   Customer    gen_x   Male    157443
4   Customer    gen_y   Female  340061
5   Customer    gen_y   Male    607945
6   Customer    gen_z   Female  44980
7   Customer    gen_z   Male    93751
8   Customer    silent  Female  159
9   Customer    silent  Male    608
10  Subscriber  baby_boomer Female  530056
11  Subscriber  baby_boomer Male    1695197
12  Subscriber  gen_x   Female  1119945
13  Subscriber  gen_x   Male    3811786
14  Subscriber  gen_y   Female  2319716
15  Subscriber  gen_y   Male    6304151
16  Subscriber  gen_z   Female  74390
17  Subscriber  gen_z   Male    284011
18  Subscriber  silent  Female  20133
19  Subscriber  silent  Male    59013

I would like to calculate % of record by gender.我想按性别计算记录的百分比。 For example:例如:

user: Customer > generation: baby_boomer > gender: Female 19,458 & Male 37,510.用户:客户 > 世代:婴儿潮一代 > 性别:女性 19,458 和男性 37,510。 Female is 34% & Male is 66% after rounding for this group of user & generation.在对这组用户和一代进行四舍五入后,女性为 34%,男性为 66%。

Below is my current solution:以下是我目前的解决方案:

# create a new data frame which calculate total record by group of user & generation
t = gender_mix.groupby(by=['user', 'generation']).sum()\
                    .reset_index()\
                    .rename(columns={'record': 'total_by_gen'})

# merge original data frame & 't'
# calculate new variable 'percent' by dividing 'record' with 'total_by_gen'
gender_mix = pd.merge(left=gender_mix, right=t, on=['user', 'generation'])\
                .assign(percent = lambda data: data.record * 100 / data.total_by_gen)\
                .assign(percent = lambda data: data.percent.round().astype('int'))

Here is part of the new data frame.这是新数据框的一部分。

    user        generation  gender  record  total_by_gen    percent
0   Customer    baby_boomer Female  19458          56968    34
1   Customer    baby_boomer Male    37510          56968    66
2   Customer    gen_x       Female  75333          232776   32
3   Customer    gen_x       Male    157443         232776   68
4   Customer    gen_y       Female  340061         948006   36
5   Customer    gen_y       Male    607945         948006   64
6   Customer    gen_z       Female  44980          138731   32
7   Customer    gen_z       Male    93751          138731   68
8   Customer    silent      Female  159               767   21
9   Customer    silent      Male    608               767   79

I wonder if there is a way to convert the 'record' column in original data frame to 'percentage by gender' by applying a function?我想知道是否有办法通过应用 function 将原始数据框中的“记录”列转换为“按性别划分的百分比”?

You can use transform after the groupby and assign the results directly to the column 'record' :您可以在groupby之后使用transform并将结果直接分配给列'record'

gender_mix['record'] = gender_mix\
    .groupby(['user', 'generation'])['record']\
    .transform(lambda x: round((x/sum(x)*100)).astype(int))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM