Pandas groupby using an equation in agg function

Question

Hello I am trying to get this dataframe to groupby employment but then I want to find the infection rate for each employment type.

The infection rate should be easy where it is the infected people / total people but I cannot figure out that part in a single line.

I have this

infect_df = gdf.groupby('employment').agg(rate=('infected' == 1.0 / 'infected':'size')))

Where in the dataframe if they are infected they get a 1.0 rather than a 0.0 but this I know is not the right answer. I am just getting tripped up on the getting the infected count.

Answer 1

suppose your dataframe has two columns like this

data = StringIO('''
A,0
B,0
A,0
B,1
A,1
A,1
C,1
B,1
C,0
C,0
A,0
B,1
''')
df = pd.read_csv(data,names=['employment','infected'])

you can count the rate of infected == 1 by

df.groupby(['employment'])['infected'].apply(lambda x: (x == 1).sum()/len(x))

Answer 2

Since your infected column already has 1s and 0s you can just take the average using mean :

import pandas as pd

df = pd.DataFrame(
    [('Contractor', 1), ('Contractor', 1), ('Contractor', 0),
     ('Staff', 0), ('Staff', 1), ('Staff', 0),
     ('Custodian', 1), ('Custodian', 1), ('Custodian', 1),
     ('Maintenance', 0), ('Maintenance', 0)],
    columns=['employment', 'infected']
)

infection_rate = df.groupby('employment')['infected'].mean().reset_index()

# For Display
print(infection_rate.to_string())

Output:

    employment  infected
0   Contractor  0.666667
1    Custodian  1.000000
2  Maintenance  0.000000
3        Staff  0.333333

Pandas groupby using an equation in agg function

Question

2 answers

solution1
1 2021-04-25 01:50:07

solution2
1 ACCPTED 2021-04-25 02:28:12

Pandas groupby using an equation in agg function

Question

2 answers

solution1 1 2021-04-25 01:50:07

solution2 1 ACCPTED 2021-04-25 02:28:12

solution1
1 2021-04-25 01:50:07

solution2
1 ACCPTED 2021-04-25 02:28:12