[英]Pandas dataframe, how can I group by single column and apply sum to multiple column and add new sum column?
This should be an easy one, but somehow I couldn't find a solution that works.这应该很容易,但不知何故我找不到有效的解决方案。
I have a pandas dataframe which looks like this:我有一个 pandas dataframe 看起来像这样:
Slno Date col2 col3 col4 col5 col6
0 01/02/20 2 1 2 5 d
1 03/02/20 5 1 2 4 g
2 04/02/20 5 1 2 5 h
3 05/02/20 4 1 2 6 e
4 08/02/20 8 1 2 5 g
5 05/02/20 8 1 2 8 r
**I want to group by Date and get the sum()
of col2, col3, col4, col5 row wise as new column Total **我想按日期分组并获得 col2、col3、col4、col5 行的
sum()
作为新列 Total
Here is what I tried:这是我尝试过的:
df_new[Total] = df.groupby(['Date', sort=False])["col2", "col3", col4", "col5].sum(axis = 1)
It gives ValueError: Wrong number of items passed 4, placement implies 1它给出了 ValueError: Wrong number of items passed 4, placement 意味着 1
also I have tried我也试过
df = (df.groupby(['Date'])
.agg(Total=('ConfirmedIndianNational', 'ConfirmedForeignNational', 'Cured', 'Deaths', 'sum'))
.reset_index())
It gives TypeError: aggregate() missing 1 required positional argument: 'arg'它给出了 TypeError: aggregate() missing 1 required positional argument: 'arg'
I am new python, searched all the possible solution but of new use.我是新 python,搜索了所有可能的解决方案,但有新用途。
You can set Date
as index then take sum of the columns on axis=1, then groupby level=0
and transform sum
您可以将
Date
设置为索引,然后取axis = 1上的列的总和,然后 groupby level=0
并转换sum
df['Total'] = (df.set_index('Date')[["col2", "col3","col4", "col5"]].sum(1)
.groupby(level=0).transform('sum').to_numpy())
print(df)
Slno Date col2 col3 col4 col5 col6 Total
0 0 01/02/20 2 1 2 5 d 10
1 1 03/02/20 5 1 2 4 g 12
2 2 04/02/20 5 1 2 5 h 13
3 3 05/02/20 4 1 2 6 e 32 # this is duplicated per group
4 4 08/02/20 8 1 2 5 g 16
5 5 05/02/20 8 1 2 8 r 32 # this is duplicated per group
It is not clear is you have multiple values per date.目前尚不清楚每个日期是否有多个值。 If that is the case, your group by should aggregate that column by what ever function you want and the perform the sum of the rows with the aggregation.
如果是这种情况,您的 group by 应该按您想要的 function 聚合该列,并使用聚合执行行的总和。
For example you want the max value for each column per date:例如,您想要每个日期每列的最大值:
max_df = df.groupby('Date')(['Date'])["col2", "col3", "col4", "col5"].max()
then:然后:
max_df.loc[:,'sum_cols'] = max_df[["col2", "col3", "col4", "col5"]].sum(axis = 1)
If you only have one row per date, you can do:如果您每个日期只有一行,您可以执行以下操作:
df.loc[:,'sum_cols'] = df[["col2", "col3", "col4", "col5"]].sum(axis = 1)
Why don't you do a simple:你为什么不做一个简单的:
df[Total] = df.groupby(['Date').col1.sum()+df.groupby(['Date').col2.sum()+df.groupby(['Date').col3.sum()+df.groupby(['Date').col4.sum()
It should be ok.应该没问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.