简体   繁体   English

Pandas dataframe,如何按单列分组并将总和应用于多列并添加新的总和列?

[英]Pandas dataframe, how can I group by single column and apply sum to multiple column and add new sum column?

This should be an easy one, but somehow I couldn't find a solution that works.这应该很容易,但不知何故我找不到有效的解决方案。

I have a pandas dataframe which looks like this:我有一个 pandas dataframe 看起来像这样:

Slno   Date         col2   col3   col4   col5  col6
0     01/02/20      2      1      2      5     d
1     03/02/20      5      1      2      4     g
2     04/02/20      5      1      2      5     h
3     05/02/20      4      1      2      6     e
4     08/02/20      8      1      2      5     g
5     05/02/20      8      1      2      8     r

**I want to group by Date and get the sum() of col2, col3, col4, col5 row wise as new column Total **我想按日期分组并获得 col2、col3、col4、col5 行的sum()作为新列 Total

Here is what I tried:这是我尝试过的:

df_new[Total] = df.groupby(['Date', sort=False])["col2", "col3", col4", "col5].sum(axis = 1)

It gives ValueError: Wrong number of items passed 4, placement implies 1它给出了 ValueError: Wrong number of items passed 4, placement 意味着 1

also I have tried我也试过

         df = (df.groupby(['Date'])
         .agg(Total=('ConfirmedIndianNational', 'ConfirmedForeignNational', 'Cured', 'Deaths', 'sum'))
         .reset_index())

It gives TypeError: aggregate() missing 1 required positional argument: 'arg'它给出了 TypeError: aggregate() missing 1 required positional argument: 'arg'

I am new python, searched all the possible solution but of new use.我是新 python,搜索了所有可能的解决方案,但有新用途。

You can set Date as index then take sum of the columns on axis=1, then groupby level=0 and transform sum您可以将Date设置为索引,然后取axis = 1上的列的总和,然后 groupby level=0并转换sum

df['Total'] = (df.set_index('Date')[["col2", "col3","col4", "col5"]].sum(1)
           .groupby(level=0).transform('sum').to_numpy())

print(df)

   Slno      Date  col2  col3  col4  col5 col6  Total
0     0  01/02/20     2     1     2     5    d     10
1     1  03/02/20     5     1     2     4    g     12
2     2  04/02/20     5     1     2     5    h     13
3     3  05/02/20     4     1     2     6    e     32 # this is duplicated per group
4     4  08/02/20     8     1     2     5    g     16
5     5  05/02/20     8     1     2     8    r     32 # this is duplicated per group

It is not clear is you have multiple values per date.目前尚不清楚每个日期是否有多个值。 If that is the case, your group by should aggregate that column by what ever function you want and the perform the sum of the rows with the aggregation.如果是这种情况,您的 group by 应该按您想要的 function 聚合该列,并使用聚合执行行的总和。

For example you want the max value for each column per date:例如,您想要每个日期每列的最大值:

max_df = df.groupby('Date')(['Date'])["col2", "col3", "col4", "col5"].max()

then:然后:

max_df.loc[:,'sum_cols'] = max_df[["col2", "col3", "col4", "col5"]].sum(axis = 1)

If you only have one row per date, you can do:如果您每个日期只有一行,您可以执行以下操作:

df.loc[:,'sum_cols'] = df[["col2", "col3", "col4", "col5"]].sum(axis = 1)

Why don't you do a simple:你为什么不做一个简单的:

df[Total] = df.groupby(['Date').col1.sum()+df.groupby(['Date').col2.sum()+df.groupby(['Date').col3.sum()+df.groupby(['Date').col4.sum()

It should be ok.应该没问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas dataframe,如何按多列分组并为特定列应用总和并添加新的计数列? - Pandas dataframe, how can I group by multiple columns and apply sum for specific column and add new count column? 在 PySpark 数据框中添加列总和作为新列 - Add column sum as new column in PySpark dataframe 如何向 dataframe (df1) 添加一个新列,这是另一个 dataframe (df2) 中 df1 的多个查找值的总和 - How can I add a new column to a dataframe (df1) that is the sum of multiple lookup values from df1 in another dataframe (df2) 我正在尝试使用python pandas数据框将多个列求和成一个新的sum列 - I'm trying to sum multiple columns into a new sum column using a python pandas dataframe Pandas DataFrame中列的总和 - Sum of a column in Pandas DataFrame Python Pandas Group Dataframe按列/ Sum Integer列按String列 - Python Pandas Group Dataframe by Column / Sum Integer Column by String Column Pandas数据框-基于组的每一列的总和 - Pandas dataframe - sum of each column based on group 在pandas数据框中的单个列中对一系列单元格求和 - Sum a range of cells in a single column in pandas dataframe 按单列汇总分组的熊猫数据框 - Sum grouped Pandas dataframe by single column 如何对与pandas DataFrame中另一列的特定值对应的列值求和? - How can I sum column values that corrispond to a specific value of another column in a pandas DataFrame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM