Hello I am trying to get this dataframe to groupby employment but then I want to find the infection rate for each employment type.
The infection rate should be easy where it is the infected people / total people but I cannot figure out that part in a single line.
I have this
infect_df = gdf.groupby('employment').agg(rate=('infected' == 1.0 / 'infected':'size')))
Where in the dataframe if they are infected they get a 1.0 rather than a 0.0 but this I know is not the right answer. I am just getting tripped up on the getting the infected count.
suppose your dataframe has two columns like this
data = StringIO('''
A,0
B,0
A,0
B,1
A,1
A,1
C,1
B,1
C,0
C,0
A,0
B,1
''')
df = pd.read_csv(data,names=['employment','infected'])
you can count the rate of infected == 1 by
df.groupby(['employment'])['infected'].apply(lambda x: (x == 1).sum()/len(x))
Since your infected
column already has 1s and 0s you can just take the average using mean :
import pandas as pd
df = pd.DataFrame(
[('Contractor', 1), ('Contractor', 1), ('Contractor', 0),
('Staff', 0), ('Staff', 1), ('Staff', 0),
('Custodian', 1), ('Custodian', 1), ('Custodian', 1),
('Maintenance', 0), ('Maintenance', 0)],
columns=['employment', 'infected']
)
infection_rate = df.groupby('employment')['infected'].mean().reset_index()
# For Display
print(infection_rate.to_string())
Output:
employment infected
0 Contractor 0.666667
1 Custodian 1.000000
2 Maintenance 0.000000
3 Staff 0.333333
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.