简体   繁体   中英

Updating dataframe with new dataframe, overwritting

Although this question seems to be similar to previous ones, I could not have it solved with previous answers and I need help from experts.

I am trying to update an existing data frame (df1) with data that is received from a different data frame (df2) into a new dataframe (df). Data frame df2 may have new column, new rows or new/blank data. Here is an example of what I am trying to accomplish.

df1 = pd.DataFrame(np.array([[1, 'A1', 'B1'], [2, 'A2', 'B2'], [3, 'A3', 'B3']]), columns=['ID', 'A', 'B'])
df1

    ID  A   B
0   1   A1  B1
1   2   A2  B2
2   3   A3  B3

df2 = pd.DataFrame(np.array([[1, 'A1X', 'B1X'], [2, 'A2X', ''], [4, 'A4', 'B4']]), columns=['ID', 'A', 'B'])
df2

    ID  A   B
0   1   A1X B1X
1   2   A2X NaN
2   4   A4  B4

The desired output is:

df
    ID  A   B
0   1   A1X B1X
1   2   A2X B2
2   3   A3  B3
3   4   A4  B4

Can you please help me?

A novice Panda user

Set the index for each dataframe with set_index() and use combine_first() :

Also, per Scott Boston's answer make sure to replace blank values with nan first.

df2.set_index('ID').combine_first(df1.set_index('ID')).reset_index()
Out[1]: 
  ID    A    B
0  1  A1X  B1X
1  2  A2X     
2  3   A3   B3
3  4   A4   B4

Try:

df1 = pd.DataFrame(np.array([[1, 'A1', 'B1'], [2, 'A2', 'B2'], [3, 'A3', 'B3']]), columns=['ID', 'A', 'B'])


df2 = pd.DataFrame(np.array([[1, 'A1X', 'B1X'], [2, 'A2X', ''], [4, 'A4', 'B4']]), columns=['ID', 'A', 'B'])


df1 = df1.set_index('ID').replace('', np.nan)
df2 = df2.set_index('ID').replace('', np.nan)

df_out = df2.combine_first(df1)
print(df_out)

Output:

      A    B
ID          
1   A1X  B1X
2   A2X   B2
3    A3   B3
4    A4   B4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM