简体   繁体   中英

How to add additional sum column to the DataFrame based on specific column groups?

In such case, I have DataFrame like

col1  col2
   a     1
   a     2
   a     3
   b     1
   b     2

What I want is first groupby col1 and then sum col2 columns of the groups, finally add the sum to the DataFrame and get

col1  col2  sum
   a     1    6
   a     2    6
   a     3    6
   b     1    3
   b     2    3

Use transform :

df['sum'] = df.groupby('col1')['col2'].transform('sum')
print (df)
  col1  col2  sum
0    a     1    6
1    a     2    6
2    a     3    6
3    b     1    3
4    b     2    3

Or map by aggregate sum :

df['sum'] = df['col1'].map(df.groupby('col1')['col2'].sum())
print (df)
  col1  col2  sum
0    a     1    6
1    a     2    6
2    a     3    6
3    b     1    3
4    b     2    3

Option 1
transform returns a result with the same index of the original object.
I use assign to return a copy of the dataframe with a new column.
See split-apply-combine documentation for more information.

df.assign(Sum=df.groupby('col1').col2.transform('sum'))

  col1  col2  Sum
0    a     1    6
1    a     2    6
2    a     3    6
3    b     1    3
4    b     2    3

Option 2
Use join on results of normal groupby and sum .

df.join(df.groupby('col1').col2.sum().rename('Sum'), on='col1')

  col1  col2  Sum
0    a     1    6
1    a     2    6
2    a     3    6
3    b     1    3
4    b     2    3

Option 3
Creative approach with pd.factorize and np.bincount

f, u = df.col1.factorize()
df.assign(Sum=np.bincount(f, df.col2).astype(df.col2.dtype)[f])

  col1  col2  Sum
0    a     1    6
1    a     2    6
2    a     3    6
3    b     1    3
4    b     2    3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM