Although this question seems to be similar to previous ones, I could not have it solved with previous answers and I need help from experts.
I am trying to update an existing data frame (df1) with data that is received from a different data frame (df2) into a new dataframe (df). Data frame df2 may have new column, new rows or new/blank data. Here is an example of what I am trying to accomplish.
df1 = pd.DataFrame(np.array([[1, 'A1', 'B1'], [2, 'A2', 'B2'], [3, 'A3', 'B3']]), columns=['ID', 'A', 'B'])
df1
ID A B
0 1 A1 B1
1 2 A2 B2
2 3 A3 B3
df2 = pd.DataFrame(np.array([[1, 'A1X', 'B1X'], [2, 'A2X', ''], [4, 'A4', 'B4']]), columns=['ID', 'A', 'B'])
df2
ID A B
0 1 A1X B1X
1 2 A2X NaN
2 4 A4 B4
The desired output is:
df
ID A B
0 1 A1X B1X
1 2 A2X B2
2 3 A3 B3
3 4 A4 B4
Can you please help me?
A novice Panda user
Set the index for each dataframe with set_index()
and use combine_first()
:
Also, per Scott Boston's answer make sure to replace
blank values with nan
first.
df2.set_index('ID').combine_first(df1.set_index('ID')).reset_index()
Out[1]:
ID A B
0 1 A1X B1X
1 2 A2X
2 3 A3 B3
3 4 A4 B4
Try:
df1 = pd.DataFrame(np.array([[1, 'A1', 'B1'], [2, 'A2', 'B2'], [3, 'A3', 'B3']]), columns=['ID', 'A', 'B'])
df2 = pd.DataFrame(np.array([[1, 'A1X', 'B1X'], [2, 'A2X', ''], [4, 'A4', 'B4']]), columns=['ID', 'A', 'B'])
df1 = df1.set_index('ID').replace('', np.nan)
df2 = df2.set_index('ID').replace('', np.nan)
df_out = df2.combine_first(df1)
print(df_out)
Output:
A B
ID
1 A1X B1X
2 A2X B2
3 A3 B3
4 A4 B4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.