简体   繁体   English

pandas groupby 计算 groupby 列的百分比

[英]pandas groupby to calculate percentage of groupby columns

I want to calculate the rate_death percentage as below - (new_deaths / population) * 100 after grouping by location and summing new_deaths.我想计算 rate_death 百分比如下 - (new_deaths / population) * 100 按位置分组并汇总 new_deaths 后。

Example: for Afghanistan, rate_death must calculate as ((1+4+10) / 38928341) * 100 And for Albania, it must calculate as ((0+0+1) / 2877800) * 100示例:对于阿富汗,rate_death 必须计算为 ((1+4+10) / 38928341) * 100 而对于阿尔巴尼亚,它必须计算为 ((0+0+1) / 2877800) * 100

Below is the data and approaches which I tried but not working -以下是我尝试但不起作用的数据和方法 -

df_data
    location    date        new_cases   new_deaths  population
0   Afghanistan 4/25/2020   70          1           38928341
1   Afghanistan 4/26/2020   112         4           38928341
2   Afghanistan 4/27/2020   68          10          38928341
3   Albania    4/25/2020    15          0           2877800
4   Albania    4/26/2020    34          0           2877800
5   Albania    4/27/2020    14          1           2877800
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   location    6 non-null      object
 1   date        6 non-null      object
 2   new_cases   6 non-null      int64 
 3   new_deaths  6 non-null      int64 
 4   population  6 non-null      int64 

Approach 1:方法一:

df_res = df_data[['location','new_deaths','population']].groupby(['location']).sum()
location    new_deaths  population      
Afghanistan 15          116785023
Albania     1           8633400
df_res['rate_death'] = (df_res['new_deaths'] / df_res['population'] * 100.0)
location    new_deaths  population  rate_death          
Afghanistan 15          116785023   0.000
Albania     1           8633400     0.000

I know that the population is summing up twice due to the above groupby with 'sum' operation, but still I wonder why is the rate_death not calculating the percentage as expected but rather showing as 0.000我知道由于上述 groupby 的“sum”操作,人口总计两次,但我仍然想知道为什么 rate_death 没有按预期计算百分比,而是显示为 0.000

Approach 2: (tried as mentioned in this post - Pandas percentage of total with groupby )方法 2:(如本文所述尝试过 - Pandas 与 groupby 的总百分比

location_population = df_data.groupby(['location', 'population']).agg({'new_deaths': 'sum'})
location = df_data.groupby(['location']).agg({'population': 'mean'})
location_population.div(location, level='location') * 100
location    population  new_deaths  population
        
Afghanistan 38928341    NaN         NaN
Albania     2877800     NaN         NaN

But it is coming as NaN.但它以 NaN 的形式出现。

Please help if anything wrong in these approaches or how to resolve this.如果这些方法有任何问题或如何解决,请提供帮助。 Thanks!谢谢!

You can do -你可以做 -

df = df.groupby(['location']).agg({'new_deaths': sum, 'population': max})
df['rate_death'] = df['new_deaths'] / df['population'] * 100

Result结果

             new_deaths  population  rate_death
location
Afghanistan          15    38928341    0.000039
Albania               1     2877800    0.000035

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM