Pandas dataframe，如何按单列分组并将总和应用于多列并添加新的总和列？

Question

This should be an easy one, but somehow I couldn't find a solution that works.这应该很容易，但不知何故我找不到有效的解决方案。

I have a pandas dataframe which looks like this:我有一个 pandas dataframe 看起来像这样：

Slno   Date         col2   col3   col4   col5  col6
0     01/02/20      2      1      2      5     d
1     03/02/20      5      1      2      4     g
2     04/02/20      5      1      2      5     h
3     05/02/20      4      1      2      6     e
4     08/02/20      8      1      2      5     g
5     05/02/20      8      1      2      8     r

**I want to group by Date and get the sum() of col2, col3, col4, col5 row wise as new column Total **我想按日期分组并获得 col2、col3、col4、col5 行的sum()作为新列 Total

Here is what I tried:这是我尝试过的：

df_new[Total] = df.groupby(['Date', sort=False])["col2", "col3", col4", "col5].sum(axis = 1)

It gives ValueError: Wrong number of items passed 4, placement implies 1它给出了 ValueError: Wrong number of items passed 4, placement 意味着 1

also I have tried我也试过

         df = (df.groupby(['Date'])
         .agg(Total=('ConfirmedIndianNational', 'ConfirmedForeignNational', 'Cured', 'Deaths', 'sum'))
         .reset_index())

It gives TypeError: aggregate() missing 1 required positional argument: 'arg'它给出了 TypeError: aggregate() missing 1 required positional argument: 'arg'

I am new python, searched all the possible solution but of new use.我是新 python，搜索了所有可能的解决方案，但有新用途。

Answer 1

You can set Date as index then take sum of the columns on axis=1, then groupby level=0 and transform sum您可以将Date设置为索引，然后取axis = 1上的列的总和，然后 groupby level=0并转换sum

df['Total'] = (df.set_index('Date')[["col2", "col3","col4", "col5"]].sum(1)
           .groupby(level=0).transform('sum').to_numpy())

print(df)

   Slno      Date  col2  col3  col4  col5 col6  Total
0     0  01/02/20     2     1     2     5    d     10
1     1  03/02/20     5     1     2     4    g     12
2     2  04/02/20     5     1     2     5    h     13
3     3  05/02/20     4     1     2     6    e     32 # this is duplicated per group
4     4  08/02/20     8     1     2     5    g     16
5     5  05/02/20     8     1     2     8    r     32 # this is duplicated per group

Answer 2

It is not clear is you have multiple values per date.目前尚不清楚每个日期是否有多个值。 If that is the case, your group by should aggregate that column by what ever function you want and the perform the sum of the rows with the aggregation.如果是这种情况，您的 group by 应该按您想要的 function 聚合该列，并使用聚合执行行的总和。

For example you want the max value for each column per date:例如，您想要每个日期每列的最大值：

max_df = df.groupby('Date')(['Date'])["col2", "col3", "col4", "col5"].max()

then:然后：

max_df.loc[:,'sum_cols'] = max_df[["col2", "col3", "col4", "col5"]].sum(axis = 1)

If you only have one row per date, you can do:如果您每个日期只有一行，您可以执行以下操作：

df.loc[:,'sum_cols'] = df[["col2", "col3", "col4", "col5"]].sum(axis = 1)

Answer 3

Why don't you do a simple:你为什么不做一个简单的：

df[Total] = df.groupby(['Date').col1.sum()+df.groupby(['Date').col2.sum()+df.groupby(['Date').col3.sum()+df.groupby(['Date').col4.sum()

It should be ok.应该没问题。

Pandas dataframe，如何按单列分组并将总和应用于多列并添加新的总和列？

问题描述

3 个解决方案

解决方案1
1 已采纳 2021-01-31 16:18:29

解决方案2
0 2021-01-31 16:17:20

解决方案3
0 2021-01-31 16:18:44

Pandas dataframe，如何按单列分组并将总和应用于多列并添加新的总和列？

问题描述

3 个解决方案

解决方案1 1 已采纳 2021-01-31 16:18:29

解决方案2 0 2021-01-31 16:17:20

解决方案3 0 2021-01-31 16:18:44

解决方案1
1 已采纳 2021-01-31 16:18:29

解决方案2
0 2021-01-31 16:17:20

解决方案3
0 2021-01-31 16:18:44