Pandas: combine groupby-apply with join/merge

Question

I often find myself doing a groupby-apply on a dataframe followed by merging the result with the original dataframe. Here's an example. Suppose df has columns A and B. I want to add another column whose value is the sum of column B over all rows with the same column A value as the current row. The following does the job but it is obviously sub-optimal

df.join(df.groupby('A')['B'].sum(), on='A', rsuffix='_sum')

Is it possible instead to keep the original index in groupby-sum?

Answer 1

You can use groupby.transform :

df['B_sum'] = df.groupby('A').B.transform('sum')

Demo :

df = pd.DataFrame({
        'A': [1,1,2,2],
        'B': [1,2,3,4]
    })

df['B_sum'] = df.groupby('A').B.transform('sum')

df
#   A   B   B_sum
#0  1   1   3
#1  1   2   3
#2  2   3   7
#3  2   4   7

Pandas: combine groupby-apply with join/merge

Question

1 answers

solution1
5 ACCPTED 2017-07-28 20:05:01

Pandas: combine groupby-apply with join/merge

Question

1 answers

solution1 5 ACCPTED 2017-07-28 20:05:01

solution1
5 ACCPTED 2017-07-28 20:05:01