[英]pandas groupby to calculate percentage of groupby columns
I want to calculate the rate_death percentage as below - (new_deaths / population) * 100 after grouping by location and summing new_deaths.我想计算 rate_death 百分比如下 - (new_deaths / population) * 100 按位置分组并汇总 new_deaths 后。
Example: for Afghanistan, rate_death must calculate as ((1+4+10) / 38928341) * 100 And for Albania, it must calculate as ((0+0+1) / 2877800) * 100示例:对于阿富汗,rate_death 必须计算为 ((1+4+10) / 38928341) * 100 而对于阿尔巴尼亚,它必须计算为 ((0+0+1) / 2877800) * 100
Below is the data and approaches which I tried but not working -以下是我尝试但不起作用的数据和方法 -
df_data
location date new_cases new_deaths population 0 Afghanistan 4/25/2020 70 1 38928341 1 Afghanistan 4/26/2020 112 4 38928341 2 Afghanistan 4/27/2020 68 10 38928341 3 Albania 4/25/2020 15 0 2877800 4 Albania 4/26/2020 34 0 2877800 5 Albania 4/27/2020 14 1 2877800
Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 location 6 non-null object 1 date 6 non-null object 2 new_cases 6 non-null int64 3 new_deaths 6 non-null int64 4 population 6 non-null int64
Approach 1:方法一:
df_res = df_data[['location','new_deaths','population']].groupby(['location']).sum()
location new_deaths population Afghanistan 15 116785023 Albania 1 8633400
df_res['rate_death'] = (df_res['new_deaths'] / df_res['population'] * 100.0)
location new_deaths population rate_death Afghanistan 15 116785023 0.000 Albania 1 8633400 0.000
I know that the population is summing up twice due to the above groupby with 'sum' operation, but still I wonder why is the rate_death not calculating the percentage as expected but rather showing as 0.000我知道由于上述 groupby 的“sum”操作,人口总计两次,但我仍然想知道为什么 rate_death 没有按预期计算百分比,而是显示为 0.000
Approach 2: (tried as mentioned in this post - Pandas percentage of total with groupby )方法 2:(如本文所述尝试过 - Pandas 与 groupby 的总百分比)
location_population = df_data.groupby(['location', 'population']).agg({'new_deaths': 'sum'})
location = df_data.groupby(['location']).agg({'population': 'mean'})
location_population.div(location, level='location') * 100
location population new_deaths population Afghanistan 38928341 NaN NaN Albania 2877800 NaN NaN
But it is coming as NaN.但它以 NaN 的形式出现。
Please help if anything wrong in these approaches or how to resolve this.如果这些方法有任何问题或如何解决,请提供帮助。 Thanks!谢谢!
You can do -你可以做 -
df = df.groupby(['location']).agg({'new_deaths': sum, 'population': max})
df['rate_death'] = df['new_deaths'] / df['population'] * 100
Result结果
new_deaths population rate_death
location
Afghanistan 15 38928341 0.000039
Albania 1 2877800 0.000035
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.