简体   繁体   中英

Sum up non-unique rows in DataFrame

I have a dataframe like this:

id = [1,1,2,3]
x1 = [0,1,1,2]
x2 = [2,3,1,1]

df = pd.DataFrame({'id':id, 'x1':x1, 'x2':x2})

df
id  x1  x2
1   0   2
1   1   3
2   1   1
3   2   1

Some rows have the same id . I want to sum up such rows (over x1 and x2 ) to obtain a new dataframe with unique ids :

df_new
id  x1  x2
1   1   5
2   1   1
3   2   1

An important detail is that the real number of columns x1 , x2 ,... is large, so I cannot apply a function that requires manual input of column names.

As discussed you can use pandas groupby function to sum based on the id value:

df.groupby(df.id).sum()
# or
df.groupby('id').sum()

If you need don't want id to become the index then you can:

df.groupby('id').sum().reset_index()
# or
df.groupby('id', as_index=False).sum()   # @John_Gait

With pivot_table :

In [31]: df.pivot_table(index='id', aggfunc=sum)
Out[31]:
    x1  x2
id
1    1   5
2    1   1
3    2   1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM