[英]Joining two panda dataframes with same columns and merging rows with same index
I have two dataframes df1 and df2 each with the same column names using timestamps as indicies.我有两个数据帧df1和df2 ,每个数据帧都具有相同的列名,使用时间戳作为索引。 I want to concatenate the two dataframes whilst merging rows with the same index choosing the values stored in df2 as preference.
我想连接两个数据帧,同时合并具有相同索引的行,选择存储在df2中的值作为首选项。 This is poorly worded but see below.
这措辞不好,但见下文。 Eg
例如
>>> df1= TimeStamp A_Output B_Output C_Output
00:00:00 20 15 5
00:00:06 20 NaN 3
00:00:15 15 6 NaN
00:00:20 20 NaN 5
00:00:30 25 14 10
>>> df2= TimeStamp A_Output B_Output C_Output
00:00:00 15 5 8
00:00:04 16 NaN NaN
00:00:06 17 NaN NaN
00:00:15 NaN NaN 2
00:00:18 19 NaN NaN
00:00:21 14 NaN NaN
00:00:26 32 NaN 5
>>> df3= TimeStamp A_Output B_Output C_Output
00:00:00 15 5 8
00:00:04 16 NaN NaN
00:00:06 17 NaN 3
00:00:15 15 6 2
00:00:18 19 NaN NaN
00:00:21 14 NaN NaN
00:00:26 32 NaN 5
00:00:30 25 14 10
df3 is what I would like to achieve. df3是我想要实现的。 Here there is a timestamp for every index in df1 and df2 .
这里df1和df2中的每个索引都有一个时间戳。 For each common index, where db2 is not NaN, we take the values, otherwise we preserve those stored in df1 .
对于每个公共索引,其中 db2 不是 NaN,我们取值,否则我们保留存储在df1中的值。
df1 >>> 00:00:15 15 6 NaN
df2 >>> 00:00:15 NaN NaN 2
df3 >>> 00:00:15 15 6 2
df1 >>> 00:00:00 20 15 5
df2 >>> 00:00:00 15 5 8
df3 >>> 00:00:00 15 5 8
For clarification see the above examples.有关说明,请参见上述示例。 I really can't find a way to do this -- for reference each dataframe has around 90 columns and 100k+ rows.
我真的找不到这样做的方法——作为参考,每个 dataframe 大约有 90 列和 100k+ 行。
Try combine first:先试试结合:
df3 = df2.combine_first(df1)
print(df3)
A_Output B_Output C_Output
TimeStamp
00:00:00 15.0 5.0 8.0
00:00:04 16.0 NaN NaN
00:00:06 17.0 NaN 3.0
00:00:15 15.0 6.0 2.0
00:00:18 19.0 NaN NaN
00:00:20 20.0 NaN 5.0
00:00:21 14.0 NaN NaN
00:00:26 32.0 NaN 5.0
00:00:30 25.0 14.0 10.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.