I have a dataframe like this:
id = [1,1,2,3]
x1 = [0,1,1,2]
x2 = [2,3,1,1]
df = pd.DataFrame({'id':id, 'x1':x1, 'x2':x2})
df
id x1 x2
1 0 2
1 1 3
2 1 1
3 2 1
Some rows have the same id
. I want to sum up such rows (over x1
and x2
) to obtain a new dataframe with unique ids
:
df_new
id x1 x2
1 1 5
2 1 1
3 2 1
An important detail is that the real number of columns x1
, x2
,... is large, so I cannot apply a function that requires manual input of column names.
As discussed you can use pandas groupby
function to sum based on the id
value:
df.groupby(df.id).sum()
# or
df.groupby('id').sum()
If you need don't want id
to become the index then you can:
df.groupby('id').sum().reset_index()
# or
df.groupby('id', as_index=False).sum() # @John_Gait
With pivot_table
:
In [31]: df.pivot_table(index='id', aggfunc=sum)
Out[31]:
x1 x2
id
1 1 5
2 1 1
3 2 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.