In such case, I have DataFrame like
col1 col2
a 1
a 2
a 3
b 1
b 2
What I want is first groupby col1
and then sum col2
columns of the groups, finally add the sum
to the DataFrame and get
col1 col2 sum
a 1 6
a 2 6
a 3 6
b 1 3
b 2 3
Option 1
transform
returns a result with the same index of the original object.
I use assign
to return a copy of the dataframe with a new column.
See split-apply-combine documentation for more information.
df.assign(Sum=df.groupby('col1').col2.transform('sum'))
col1 col2 Sum
0 a 1 6
1 a 2 6
2 a 3 6
3 b 1 3
4 b 2 3
Option 2
Use join
on results of normal groupby
and sum
.
df.join(df.groupby('col1').col2.sum().rename('Sum'), on='col1')
col1 col2 Sum
0 a 1 6
1 a 2 6
2 a 3 6
3 b 1 3
4 b 2 3
Option 3
Creative approach with pd.factorize
and np.bincount
f, u = df.col1.factorize()
df.assign(Sum=np.bincount(f, df.col2).astype(df.col2.dtype)[f])
col1 col2 Sum
0 a 1 6
1 a 2 6
2 a 3 6
3 b 1 3
4 b 2 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.