I often find myself doing a groupby-apply on a dataframe followed by merging the result with the original dataframe. Here's an example. Suppose df
has columns A and B. I want to add another column whose value is the sum of column B over all rows with the same column A value as the current row. The following does the job but it is obviously sub-optimal
df.join(df.groupby('A')['B'].sum(), on='A', rsuffix='_sum')
Is it possible instead to keep the original index in groupby-sum?
You can use groupby.transform
:
df['B_sum'] = df.groupby('A').B.transform('sum')
Demo :
df = pd.DataFrame({
'A': [1,1,2,2],
'B': [1,2,3,4]
})
df['B_sum'] = df.groupby('A').B.transform('sum')
df
# A B B_sum
#0 1 1 3
#1 1 2 3
#2 2 3 7
#3 2 4 7
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.