简体   繁体   中英

Pandas (Python) - Update column of a dataframe from another one with conditions and different columns

I had a problem and I found a solution but I feel it's the wrong way to do it. Maybe, there is a more 'canonical' way to do it.

I already had an answer for a really similar problem , but here I have not the same amount of rows in each dataframe. Sorry for the "double-post", but the first one is still valid so I think it's better to make a new one.

Problem

I have two dataframe that I would like to merge without having extra column and without erasing existing infos. Example :

Existing dataframe (df)

   A  A2  B
0  1   4  0
1  2   5  1
2  2   5  1

Dataframe to merge (df2)

   A  A2  B
0  1   4  2
1  3   5  2

I would like to update df with df2 if columns 'A' and 'A2' corresponds. The result would be :

   A  A2    B
0  1   4  2 <= Update value ONLY
1  2   5  1
2  2   5  1

Here is my solution, but I think it's not a really good one.

import pandas as pd

df = pd.DataFrame([[1,4,0],[2,5,1],[2,5,1]],columns=['A','A2','B'])

df2 = pd.DataFrame([[1,4,2],[3,5,2]],columns=['A','A2','B'])

df = df.merge(df2,on=['A', 'A2'],how='left')
df['B_y'].fillna(0, inplace=True)
df['B'] = df['B_x']+df['B_y']
df = df.drop(['B_x','B_y'], axis=1)
print(df)

I tried this solution :

rows = (df[['A','A2']] == df2[['A','A2']]).all(axis=1)
df.loc[rows,'B'] = df2.loc[rows,'B']

But I have this error because of the wrong number of rows :

ValueError: Can only compare identically-labeled DataFrame objects

Does anyone has a better way to do ? Thanks !

I think you can use DataFrame.isin for check where are same rows in both DataFrames . Then create NaN by mask , which is filled by combine_first . Last cast to int :

mask = df[['A', 'A2']].isin(df2[['A', 'A2']]).all(1)
print (mask)
0     True
1    False
2    False
dtype: bool

df.B = df.B.mask(mask).combine_first(df2.B).astype(int)
print (df)
   A  A2  B
0  1   4  2
1  2   5  1
2  2   5  1

With a minor tweak in the way in which the boolean mask gets created, you can get it to work:

cols = ['A', 'A2']
# Slice it to match the shape of the other dataframe to compare elementwise
rows = (df[cols].values[:df2.shape[0]] == df2[cols].values).all(1)
df.loc[rows,'B'] = df2.loc[rows,'B']
df

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM