如何在 Pandas 中使用此 pivot 表仅按国家/地区聚合数据？

Question

I am using this Kaggle dataset on the 2014-2016 Ebola outbreak.我在 2014-2016 年埃博拉病毒爆发时使用了这个 Kaggle 数据集。

https://www.kaggle.com/imdevskp/ebola-outbreak-20142016-complete-dataset https://www.kaggle.com/imdevskp/ebola-outbreak-20142016-complete-dataset

I want to know how to use a pivot table with Pandas and see the total unconfirmed cases (suspected & probable) per country.我想知道如何将 pivot 表与 Pandas 一起使用，并查看每个国家/地区的未确诊病例总数（疑似和可能）。 I'm not sure how to progress, I have both Country & Date on the index.我不知道如何进步，我在索引上有国家和日期。 If I use only Country in index, things get messed up.如果我在索引中只使用 Country，事情就会变得一团糟。

Country Date    Suspected Cases Probable Cases  Confirmed Cases Suspected Deaths    Probable Deaths Confirmed Deaths
0   Guinea  2014-08-29  25.0    141.0   482.0   2.0 141.0   287.0
1   Nigeria 2014-08-29  3.0 1.0 15.0    0.0 1.0 6.0
2   Sierra Leone    2014-08-29  54.0    37.0    935.0   8.0 34.0    380.0
3   Liberia 2014-08-29  382.0   674.0   322.0   168.0   301.0   225.0
4   Sierra Leone    2014-09-05  78.0    37.0    1146.0  11.0    37.0    443.0
... ... ... ... ... ... ... ... ...
2480    Liberia 2016-03-23  5636.0  1879.0  3151.0  NaN NaN NaN
2481    Italy   2016-03-23  0.0 0.0 1.0 NaN NaN NaN
2482    Liberia 2016-03-23  0.0 3.0 2.0 NaN 3.0 1.0
2483    Nigeria 2016-03-23  0.0 1.0 19.0    0.0 1.0 7.0
2484    United States of America    2016-03-23  0.0 0.0 4.0 0.0 0.0 1.0
2485 rows × 8 columns

How should I change the pivot table so that I only see exactly one total value for both Probable Cases and Suspected Cases in each country?我应该如何更改 pivot 表，以便我只能看到每个国家/地区的可能病例和疑似病例的总价值？ I want to effectively ignore dates.我想有效地忽略日期。

table = pd.pivot_table(df, index=['Country', 'Date'], columns=None, values=['Probable Cases', 'Suspected Cases'], aggfunc={
    'Suspected Cases' : 'sum',
    'Probable Cases' : 'sum'
})

    Probable Cases  Suspected Cases
Country Date        
Guinea  2014-08-29  141.0   25.0
2014-09-05  152.0   56.0
2014-09-08  151.0   47.0
2014-09-12  151.0   32.0
2014-09-16  162.0   31.0
... ... ... ...
United States of America    2015-12-17  0.0 0.0
2015-12-22  0.0 0.0
2015-12-23  0.0 0.0
2015-12-29  0.0 0.0
2016-03-23  0.0 0.0
2379 rows × 2 columns

Answer 1

If you want to ignore dates, then omit 'Date' from the index parameter of pd.pivot_table如果要忽略日期，请从pd.pivot_table的索引参数中省略“日期”

Also, I don't think you need a pivot table.另外，我认为您不需要 pivot 表。 You just need to groupby country and then specify the columns you want to sum in the.agg method.您只需要按国家/地区分组，然后在 .agg 方法中指定要求和的列。

df.groupby('Country').agg({'COL1': 'sum', 'COL2': 'sum'})

Answer 2

Change:改变：

index=['Country', 'Date']

to:至：

index='Country'

如何在 Pandas 中使用此 pivot 表仅按国家/地区聚合数据？

问题描述

2 个解决方案

解决方案1
0 2020-04-05 03:29:48

解决方案2
0 2020-04-05 03:30:05

如何在 Pandas 中使用此 pivot 表仅按国家/地区聚合数据？

问题描述

2 个解决方案

解决方案1 0 2020-04-05 03:29:48

解决方案2 0 2020-04-05 03:30:05

解决方案1
0 2020-04-05 03:29:48

解决方案2
0 2020-04-05 03:30:05