简体   繁体   English

如何在pandas系列中找到groupby函数的比率

[英]How to find the ratio in a pandas series for a groupby function

I have used groupby to group the dataset by the occupations and gender.我使用groupby按职业和性别对数据集进行分组。 Now I want to find the ratio between male and females for each occupation.现在我想找出每个职业的男女比例。 I am unable to think of how to proceed.我想不出如何继续。

在此处输入图片说明

Here's one way using pandas.pivot_table and vectorised Pandas calculations.这是使用pandas.pivot_table和矢量化 Pandas 计算的一种方法。 Note this method removes the need to perform a separate groupby .请注意,此方法不需要执行单独的groupby

df = pd.DataFrame([['A', 'F'], ['A', 'F'], ['A', 'M'], ['B', 'M'], ['B', 'M'], ['B', 'F'],
                   ['C', 'M'], ['C', 'M'], ['D', 'F']], columns=['Occupation', 'Gender'])

# pivot input dataframe
res = df.pivot_table(index='Occupation', columns='Gender', aggfunc='size', fill_value=0)

# calculate ratios
sums = res[['F', 'M']].sum(axis=1)
res['FemaleRatio'] = res['F'] / sums
res['MaleRatio'] = res['M'] / sums

print(res)

Gender      F  M  FemaleRatio  MaleRatio
Occupation                              
A           2  1     0.666667   0.333333
B           1  2     0.333333   0.666667
C           0  2     0.000000   1.000000
D           1  0     1.000000   0.000000

Maybe quite late to the party but here's what I believe is the exact answer:也许参加聚会已经很晚了,但我认为这是确切的答案:

# create pivot
male_ratio = users.pivot_table(index='occupation', columns='gender', aggfunc='size', fill_value=0)

# calculate male ratio
sums = male_ratio[['F', 'M']].sum(axis=1)
male_ratio['MaleRatio'] = round(100 * male_ratio['M'] / sums , 1)

# result
male_ratio['MaleRatio']

occupation
administrator     54.4
artist            53.6
doctor           100.0
educator          72.6
engineer          97.0
entertainment     88.9
executive         90.6
healthcare        31.2
homemaker         14.3
lawyer            83.3
librarian         43.1
marketing         61.5
none              55.6
other             65.7
programmer        90.9
retired           92.9
salesman          75.0
scientist         90.3
student           69.4
technician        96.3
writer            57.8
Name: MaleRatio, dtype: float64
x=users.groupby(['occupation','gender'])['gender'].count()
    y=users.groupby(['occupation'])['gender'].count()
    r=((x/y)*100).round(2)
    print(r)

#ratio rule "x" is a count of gender(male/female), "y" is the total count of gender

occupation     gender
administrator  F          45.57
               M          54.43
artist         F          46.43
               M          53.57
doctor         M         100.00
educator       F          27.37
               M          72.63
engineer       F           2.99
               M          97.01
entertainment  F          11.11
               M          88.89
executive      F           9.38
               M          90.62

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM