[英]How to use this pivot table in Pandas to aggregate data by Country only?
I am using this Kaggle dataset on the 2014-2016 Ebola outbreak.我在 2014-2016 年埃博拉病毒爆发时使用了这个 Kaggle 数据集。
https://www.kaggle.com/imdevskp/ebola-outbreak-20142016-complete-dataset https://www.kaggle.com/imdevskp/ebola-outbreak-20142016-complete-dataset
I want to know how to use a pivot table with Pandas and see the total unconfirmed cases (suspected & probable) per country.我想知道如何将 pivot 表与 Pandas 一起使用,并查看每个国家/地区的未确诊病例总数(疑似和可能)。 I'm not sure how to progress, I have both Country & Date on the index.我不知道如何进步,我在索引上有国家和日期。 If I use only Country in index, things get messed up.如果我在索引中只使用 Country,事情就会变得一团糟。
Country Date Suspected Cases Probable Cases Confirmed Cases Suspected Deaths Probable Deaths Confirmed Deaths
0 Guinea 2014-08-29 25.0 141.0 482.0 2.0 141.0 287.0
1 Nigeria 2014-08-29 3.0 1.0 15.0 0.0 1.0 6.0
2 Sierra Leone 2014-08-29 54.0 37.0 935.0 8.0 34.0 380.0
3 Liberia 2014-08-29 382.0 674.0 322.0 168.0 301.0 225.0
4 Sierra Leone 2014-09-05 78.0 37.0 1146.0 11.0 37.0 443.0
... ... ... ... ... ... ... ... ...
2480 Liberia 2016-03-23 5636.0 1879.0 3151.0 NaN NaN NaN
2481 Italy 2016-03-23 0.0 0.0 1.0 NaN NaN NaN
2482 Liberia 2016-03-23 0.0 3.0 2.0 NaN 3.0 1.0
2483 Nigeria 2016-03-23 0.0 1.0 19.0 0.0 1.0 7.0
2484 United States of America 2016-03-23 0.0 0.0 4.0 0.0 0.0 1.0
2485 rows × 8 columns
How should I change the pivot table so that I only see exactly one total value for both Probable Cases and Suspected Cases in each country?我应该如何更改 pivot 表,以便我只能看到每个国家/地区的可能病例和疑似病例的总价值? I want to effectively ignore dates.我想有效地忽略日期。
table = pd.pivot_table(df, index=['Country', 'Date'], columns=None, values=['Probable Cases', 'Suspected Cases'], aggfunc={
'Suspected Cases' : 'sum',
'Probable Cases' : 'sum'
})
Probable Cases Suspected Cases
Country Date
Guinea 2014-08-29 141.0 25.0
2014-09-05 152.0 56.0
2014-09-08 151.0 47.0
2014-09-12 151.0 32.0
2014-09-16 162.0 31.0
... ... ... ...
United States of America 2015-12-17 0.0 0.0
2015-12-22 0.0 0.0
2015-12-23 0.0 0.0
2015-12-29 0.0 0.0
2016-03-23 0.0 0.0
2379 rows × 2 columns
If you want to ignore dates, then omit 'Date' from the index parameter of pd.pivot_table
如果要忽略日期,请从pd.pivot_table
的索引参数中省略“日期”
Also, I don't think you need a pivot table.另外,我认为您不需要 pivot 表。 You just need to groupby country and then specify the columns you want to sum in the.agg method.您只需要按国家/地区分组,然后在 .agg 方法中指定要求和的列。
df.groupby('Country').agg({'COL1': 'sum', 'COL2': 'sum'})
Change:改变:
index=['Country', 'Date']
to:至:
index='Country'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.