I have two pandas dataframes looking like:
df1:
n column1
0 5.0 0.0
1 6.0 0.0
2 7.0 0.0
3 8.0 0.0
4 9.0 0.0
5 10.0 0.0
df2:
n column2
0 6.0 1.0
1 7.0 1.0
2 8.0 1.0
I want to sum column1
and column2
only for rows where n is the same. Desired output looks like:
df3:
n column1
0 5.0 0.0
1 6.0 1.0
2 7.0 1.0
3 8.0 1.0
4 9.0 0.0
5 10.0 0.0
Please note that:
df2
with zeroes and perform a classical sum. What I've tried so far produces something like:
n column1 0 5.0 1.0 1 6.0 1.0 2 7.0 1.0 3 8.0 NaN 4 9.0 NaN 5 10.0 NaN
Because sum is by default performed based on row's indexes in common rather than on n as I wish.
How can I perform this with pandas built-in functions ?
Use Series.add
, but first create indexes from columns n
by set_index
:
df = (df2.set_index('n')['column2']
.add(df1.set_index('n')['column1'], fill_value=0)
.reset_index(name='column1'))
print (df)
n column1
0 5.0 0.0
1 6.0 1.0
2 7.0 1.0
3 8.0 1.0
4 9.0 0.0
5 10.0 0.0
Another solution with merge
and left join:
df = (df1.merge(df2, on='n', how='left'))
df['column1'] = df['column2'].add(df['column1'], fill_value=0)
df = df.drop('column2', axis=1)
print (df)
n column1
0 5.0 0.0
1 6.0 1.0
2 7.0 1.0
3 8.0 1.0
4 9.0 0.0
5 10.0 0.0
i solved it by merging the dataframe and sum it on pandas:
df = pd.merge(df1, df2, how='outer', on='n')
df['sum'] = df['column1'] + df['column2']
df[['n', 'sum']].fillna(0)
the result looks like this:
n sum
0 5.0 0.0
1 6.0 1.0
2 7.0 1.0
3 8.0 1.0
4 9.0 0.0
5 10.0 0.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.