简体   繁体   English

Pandas dataframe,如何按多列分组并为特定列应用总和并添加新的计数列?

[英]Pandas dataframe, how can I group by multiple columns and apply sum for specific column and add new count column?

Given a dataframe df1 as follows:给定一个 dataframe df1 如下:

Col1    Col2    Col3    Col4    Col5
-------------------------------------
A       1       AA      10      Test1
A       1       AA      5       Test2
A       2       AB      30      Test3
B       4       FF      10      Test4
C       1       HH      4       Test7
C       3       GG      6       Test8
C       3       GG      7       Test9
D       1       AA      4       Test5
D       3       FF      6       Test6
  • I want to group by Col1, Col2 and Col3 and我想按 Col1、Col2 和 Col3 分组,

  • Add new column Count: size of each group添加新列计数:每组的大小

  • Add new column Col4_sum: sum of each Col4 in each group添加新列 Col4_sum:每组中每个 Col4 的总和


Output need Output 需要

Col1    Col2    Col3    Count   Col4_sum
----------------------------------------
A       1       AA      2       15
A       2       AB      1       30
B       4       FF      1       10
C       1       HH      1       4
C       3       GG      2       13
D       1       AA      1       4
D       3       FF      1       6

I try to use我尝试使用

df1.groupby(['Col1','Col2','Col3']).size 

but get only Count column.但只得到 Count 列。

Use GroupBy.agg with tuples for specify aggregate function with new columns names:GroupBy.agg与元组一起使用以指定具有新列名称的聚合 function:

df = (df1.groupby(['Col1','Col2','Col3'])['Col4']
         .agg([('Count','size'), ('Col4_sum','sum')])
         .reset_index())
print (df)
  Col1  Col2 Col3  Count  Col4_sum
0    A     1   AA      2        15
1    A     2   AB      1        30
2    B     4   FF      1        10
3    C     1   HH      1         4
4    C     3   GG      2        13
5    D     1   AA      1         4
6    D     3   FF      1         6

In pandas 0.25+ is possible use named aggregation :在 pandas 0.25+ 中可以使用named aggregation

df = (df1.groupby(['Col1','Col2','Col3'])
         .agg(Count=('Col5', 'size'), Col4_sum=('Col4', 'sum'))
         .reset_index())
print (df)
  Col1  Col2 Col3  Count  Col4_sum
0    A     1   AA      2        15
1    A     2   AB      1        30
2    B     4   FF      1        10
3    C     1   HH      1         4
4    C     3   GG      2        13
5    D     1   AA      1         4
6    D     3   FF      1         6

You can use a dict of column names and aggregation functions.您可以使用列名和聚合函数的字典。 See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.aggregate.htmlhttps://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.aggregate.html

>>> df = pd.DataFrame([[1, 2, 3],
...                    [4, 5, 6],
...                    [7, 8, 9],
...                    [np.nan, np.nan, np.nan]],
...                   columns=['A', 'B', 'C'])
>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
#         A    B
# max   NaN  8.0
# min   1.0  2.0
# sum  12.0  NaN

Another solution that's a bit more verbose and hasn't been mentioned is to use the assign function as follows:另一个更冗长且未提及的解决方案是使用分配 function ,如下所示:

df = df1.assign(Count=df1.groupby(['Col1','Col2','Col3']).Col4.transform('size'))
        .assign(Col4_sum=df1.groupby(['Col1','Col2','Col3']).Col4.transform('sum'))
        .reset_index()

This should solve your problem.这应该可以解决您的问题。

df2 = df.groupby(['Col1','Col2','Col3'])['Col4'].agg('sum')

With the agg function and a dictionary, you can customise your output like so使用 agg function 和字典,您可以像这样自定义 output

df.groupby(['Col1','Col2','Col3']).agg({'Col3': ['count'], 'Col4': ['count','sum']})

This should return a group for Col1, Col2, and Col3, while aggregating the count for Col3, and then the count and sum for Col4这应该为 Col1、Col2 和 Col3 返回一个组,同时聚合 Col3 的计数,然后是 Col4 的计数和总和

You can use the function pivot_table :您可以使用 function pivot_table

df = pd.pivot_table(df, index=['Col1', 'Col2', 'Col3'], values='Col4', aggfunc=['count', 'sum']).reset_index()
df.columns = ['Col1', 'Col2', 'Col3', 'Count', 'Col4_sum']

Output: Output:

  Col1  Col2 Col3  Count  Col4_sum
0    A     1   AA      2        15
1    A     2   AB      1        30
2    B     4   FF      1        10
3    C     1   HH      1         4
4    C     3   GG      2        13
5    D     1   AA      1         4
6    D     3   FF      1         6

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas dataframe,如何按单列分组并将总和应用于多列并添加新的总和列? - Pandas dataframe, how can I group by single column and apply sum to multiple column and add new sum column? 使用Pandas DataFrame,如何按多列分组并添加新列 - Using pandas dataframe, how to group by multiple columns and adding new column 熊猫:将特定功能应用于列并在新数据框中创建列 - Pandas: apply a specific function to columns and create column in new dataframe 我正在尝试使用python pandas数据框将多个列求和成一个新的sum列 - I'm trying to sum multiple columns into a new sum column using a python pandas dataframe Pandas 数据框:如何将 describe() 应用于每个组并添加到新列? - Pandas dataframe: how to apply describe() to each group and add to new columns? 按计数和总和分组,基于pandas数据框中的特定列以及其他列 - group by count and sum based on particular column in pandas dataframe in separate column along with other columns 如何在 Pandas 中将特定列拆分为新列? - How can I split a specific column to new columns in Pandas? 如何在Pandas数据框中的新列中添加Python对象? - How can I add a Python Object to new column in a pandas dataframe? Pandas DataFrame 根据多个条件分组添加新列值 - Pandas DataFrame add new column values based on group by multiple conditions 如何根据 pandas dataframe 中的多列按元素分组并将每组的元素数量保存在另一列中? - How can I group by elements based on multiple columns in pandas dataframe and save the number of elements of each group in another column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM