简体   繁体   中英

Pandas: combine groupby-apply with join/merge

I often find myself doing a groupby-apply on a dataframe followed by merging the result with the original dataframe. Here's an example. Suppose df has columns A and B. I want to add another column whose value is the sum of column B over all rows with the same column A value as the current row. The following does the job but it is obviously sub-optimal

df.join(df.groupby('A')['B'].sum(), on='A', rsuffix='_sum')

Is it possible instead to keep the original index in groupby-sum?

You can use groupby.transform :

df['B_sum'] = df.groupby('A').B.transform('sum')

Demo :

df = pd.DataFrame({
        'A': [1,1,2,2],
        'B': [1,2,3,4]
    })

df['B_sum'] = df.groupby('A').B.transform('sum')

df
#   A   B   B_sum
#0  1   1   3
#1  1   2   3
#2  2   3   7
#3  2   4   7

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM