Outer merge pandas dataframes with overwrite

Question

I have two DataFrames df_old and df_new with same columns x (=identifier or index) and y (=data).

Now I want to overwrite data in df_old which is available in df_new + add the data from df_new that is not in df_old , so basically an outer merge on x with overwrite of y .

I tried it with pandas.DataFrame.merge and pandas.DataFrame.update but I am not able to achieve the desired result in one line or without doing row wise computations.

Example:

x = np.array(range(0, 10))
y = np.array(range(0, 10))
df_old = pd.DataFrame(data={'x':x,'y':y})

x = np.array(range(5, 15))
y = np.array(range(0, 10))
df_new = pd.DataFrame(data={'x':x,'y':y})

x = np.array(range(0, 15))
y = np.append(np.array(range(0, 5)), np.array(range(0, 10)))
df_desired = pd.DataFrame(data={'x':x,'y':y})

EDIT: The focus of the solution should be on execution time and memory efficiency. Simple solutions eg a one-liner would be nice to have.

Answer 1

So I would use pd.merge as you have suggested, and then process the data to overwrite where possible.

Using your example:

df_merged = pd.merge(df_old, df_new, on="x", how="outer")
>>> df_merged
     x  y_x  y_y
0    0  0.0  NaN
1    1  1.0  NaN
2    2  2.0  NaN
3    3  3.0  NaN
4    4  4.0  NaN
5    5  5.0  0.0
6    6  6.0  1.0
7    7  7.0  2.0
8    8  8.0  3.0
9    9  9.0  4.0
10  10  NaN  5.0
11  11  NaN  6.0
12  12  NaN  7.0
13  13  NaN  8.0
14  14  NaN  9.0

Then create a new y column and use the data you would like to overwrite the old data with where applicable:

df_merged["y"] = [y if pd.notna(y) else x for x,y in zip(df_merged["y_x"], df_merged["y_y"])]
>>> df_merged
     x  y_x  y_y    y
0    0  0.0  NaN  0.0
1    1  1.0  NaN  1.0
2    2  2.0  NaN  2.0
3    3  3.0  NaN  3.0
4    4  4.0  NaN  4.0
5    5  5.0  0.0  0.0
6    6  6.0  1.0  1.0
7    7  7.0  2.0  2.0
8    8  8.0  3.0  3.0
9    9  9.0  4.0  4.0
10  10  NaN  5.0  5.0
11  11  NaN  6.0  6.0
12  12  NaN  7.0  7.0
13  13  NaN  8.0  8.0
14  14  NaN  9.0  9.0

Then just select the correct columns:

df_merged = df_merged[["x","y"]]
>>> df_merged
     x    y
0    0  0.0
1    1  1.0
2    2  2.0
3    3  3.0
4    4  4.0
5    5  0.0
6    6  1.0
7    7  2.0
8    8  3.0
9    9  4.0
10  10  5.0
11  11  6.0
12  12  7.0
13  13  8.0
14  14  9.0

Answer 2

I found a possible efficient solution as it's only using boolean maps but it's still ugly and I would have expected pandas to have a more convenient solution for such tasks:

df_desired = pd.merge(df_old, df_new, how='outer', on='x')
df_desired['y'] = df_desired.loc[df_desired['y_y'].isna(), 'y_x'].append(df_desired.loc[df_desired['y_y'].notna(), 'y_y'])
df_desired = df_desired[['x','y']]

Outer merge pandas dataframes with overwrite

Question

2 answers

solution1
2 2022-01-05 15:41:46

solution2
0 2022-01-05 16:22:46

Outer merge pandas dataframes with overwrite

Question

2 answers

solution1 2 2022-01-05 15:41:46

solution2 0 2022-01-05 16:22:46

solution1
2 2022-01-05 15:41:46

solution2
0 2022-01-05 16:22:46