简体   繁体   中英

Pandas groupby using an equation in agg function

Hello I am trying to get this dataframe to groupby employment but then I want to find the infection rate for each employment type.

The infection rate should be easy where it is the infected people / total people but I cannot figure out that part in a single line.

I have this

infect_df = gdf.groupby('employment').agg(rate=('infected' == 1.0 / 'infected':'size')))

Where in the dataframe if they are infected they get a 1.0 rather than a 0.0 but this I know is not the right answer. I am just getting tripped up on the getting the infected count.

suppose your dataframe has two columns like this

data = StringIO('''
A,0
B,0
A,0
B,1
A,1
A,1
C,1
B,1
C,0
C,0
A,0
B,1
''')
df = pd.read_csv(data,names=['employment','infected'])

you can count the rate of infected == 1 by

df.groupby(['employment'])['infected'].apply(lambda x: (x == 1).sum()/len(x))

Since your infected column already has 1s and 0s you can just take the average using mean :

import pandas as pd

df = pd.DataFrame(
    [('Contractor', 1), ('Contractor', 1), ('Contractor', 0),
     ('Staff', 0), ('Staff', 1), ('Staff', 0),
     ('Custodian', 1), ('Custodian', 1), ('Custodian', 1),
     ('Maintenance', 0), ('Maintenance', 0)],
    columns=['employment', 'infected']
)

infection_rate = df.groupby('employment')['infected'].mean().reset_index()

# For Display
print(infection_rate.to_string())

Output:

    employment  infected
0   Contractor  0.666667
1    Custodian  1.000000
2  Maintenance  0.000000
3        Staff  0.333333

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM