I have two dataframes df1 and df2 each with the same column names using timestamps as indicies. I want to concatenate the two dataframes whilst merging rows with the same index choosing the values stored in df2 as preference. This is poorly worded but see below. Eg
>>> df1= TimeStamp A_Output B_Output C_Output
00:00:00 20 15 5
00:00:06 20 NaN 3
00:00:15 15 6 NaN
00:00:20 20 NaN 5
00:00:30 25 14 10
>>> df2= TimeStamp A_Output B_Output C_Output
00:00:00 15 5 8
00:00:04 16 NaN NaN
00:00:06 17 NaN NaN
00:00:15 NaN NaN 2
00:00:18 19 NaN NaN
00:00:21 14 NaN NaN
00:00:26 32 NaN 5
>>> df3= TimeStamp A_Output B_Output C_Output
00:00:00 15 5 8
00:00:04 16 NaN NaN
00:00:06 17 NaN 3
00:00:15 15 6 2
00:00:18 19 NaN NaN
00:00:21 14 NaN NaN
00:00:26 32 NaN 5
00:00:30 25 14 10
df3 is what I would like to achieve. Here there is a timestamp for every index in df1 and df2 . For each common index, where db2 is not NaN, we take the values, otherwise we preserve those stored in df1 .
df1 >>> 00:00:15 15 6 NaN
df2 >>> 00:00:15 NaN NaN 2
df3 >>> 00:00:15 15 6 2
df1 >>> 00:00:00 20 15 5
df2 >>> 00:00:00 15 5 8
df3 >>> 00:00:00 15 5 8
For clarification see the above examples. I really can't find a way to do this -- for reference each dataframe has around 90 columns and 100k+ rows.
Try combine first:
df3 = df2.combine_first(df1)
print(df3)
A_Output B_Output C_Output
TimeStamp
00:00:00 15.0 5.0 8.0
00:00:04 16.0 NaN NaN
00:00:06 17.0 NaN 3.0
00:00:15 15.0 6.0 2.0
00:00:18 19.0 NaN NaN
00:00:20 20.0 NaN 5.0
00:00:21 14.0 NaN NaN
00:00:26 32.0 NaN 5.0
00:00:30 25.0 14.0 10.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.