简体   繁体   English

在 Pandas 的单个 groupby 中添加多列

[英]Adding Multiple Columns in Single groupby in Pandas

Dataset image数据集图片

Please help, I have a dataset in which I have columns Country, Gas and Year from 2019 to 1991. Also attaching the snapshot of the dataset.请帮忙,我有一个数据集,其中有从 2019 年到 1991 年的国家、天然气和年份列。还附上了数据集的快照。 I want to answer a question that I want to add all the values of a country column wise?我想回答一个问题,我想明智地添加一个国家列的所有值? For example, for Afghanistan, value should come 56.4 under 2019 (adding 28.79 + 6.23 + 16.37 + 5.01 = 56.4).例如,对于阿富汗,2019 年的数值应为 56.4(相加 28.79 + 6.23 + 16.37 + 5.01 = 56.4)。 Now I want it should calculate the result for every year.现在我希望它应该计算每年的结果。 I have used below code for achieving 2019 data.我使用以下代码来实现 2019 年的数据。

df.groupby(by='Country')['2019'].sum() 

This is the output of that code:这是该代码的 output:

Country
---------------------
Afghanistan     56.40
Albania         17.31
Algeria        558.67
Andorra          1.18
Angola         256.10
                ...  
Venezuela      588.72
Vietnam        868.40
Yemen           50.05
Zambia         182.08
Zimbabwe       235.06

I have group the data country wise and adding the 2019 column values, but how should I add values of other years in single line of code?我已经对数据国家/地区进行了分组并添加了 2019 列值,但是我应该如何在单行代码中添加其他年份的值?

Please help.请帮忙。

I can do the code shown here, to add rows and show multiple columns like this but this will be tedious task to do so write each column name.我可以执行此处显示的代码,添加行并像这样显示多列,但这样做将是一项繁琐的任务,因此要写下每一列的名称。

df.groupby(by='Country')[['2019','2018','2017']].sum() 

If you don't specify the column, it will sum all the numeric column.如果您不指定列,它将对所有数字列求和。

df.groupby(by='Country').sum() 

                 2019   2020   ...
Country
Afghanistan     56.40   32.4   ...
Albania         17.31   12.5   ...
Algeria        558.67  241.5   ...
Andorra          1.18    1.5   ...
Angola         256.10   32.1   ...
                ...      ...   ...
Venezuela      588.72  247.3   ...
Vietnam        868.40  323.5   ...
Yemen           50.05   55.7   ...
Zambia         182.08   23.4   ...
Zimbabwe       235.06  199.4   ...

Do a reset_index() to flatten the columns做一个reset_index()来展平列

df.groupby(by='Country').sum().reset_index()

Country          2020   2019   ...
Afghanistan     56.40   32.4   ...
Albania         17.31   12.5   ...
Algeria        558.67  241.5   ...
Andorra          1.18    1.5   ...
Angola         256.10   32.1   ...
                ...      ...   ...
Venezuela      588.72  247.3   ...
Vietnam        868.40  323.5   ...
Yemen           50.05   55.7   ...
Zambia         182.08   23.4   ...
Zimbabwe       235.06  199.4   ...

You can select columns keys in your dataframe starting from column 2019 till the last column key in this way:您可以通过以下方式在 dataframe 中使用 select 列键从2019列开始直到最后一列键:

df.groupby(by='Country')[df.keys()[2:]].sum() 

Method df.keys will return all dataframe columns keys in a list then you can slice it from the index of 2019 key which is 2 till end of columns keys.方法df.keys将返回列表中的所有 dataframe 列键,然后您可以从2019键的索引( 2到列键的末尾)对其进行切片。

Suppose you want to select columns from 2016 till 1992 column:假设您想要从20161992的 select 列:

df.groupby(by='Country')[df.keys()[5:-1]].sum() 

you just need to slice the list of columns keys in correct index order.您只需要按照正确的索引顺序对列键列表进行切片。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM