[英]Joining two panda dataframes with same columns and merging rows with same index
我有两个数据帧df1和df2 ,每个数据帧都具有相同的列名,使用时间戳作为索引。 我想连接两个数据帧,同时合并具有相同索引的行,选择存储在df2中的值作为首选项。 这措辞不好,但见下文。 例如
>>> df1= TimeStamp A_Output B_Output C_Output
00:00:00 20 15 5
00:00:06 20 NaN 3
00:00:15 15 6 NaN
00:00:20 20 NaN 5
00:00:30 25 14 10
>>> df2= TimeStamp A_Output B_Output C_Output
00:00:00 15 5 8
00:00:04 16 NaN NaN
00:00:06 17 NaN NaN
00:00:15 NaN NaN 2
00:00:18 19 NaN NaN
00:00:21 14 NaN NaN
00:00:26 32 NaN 5
>>> df3= TimeStamp A_Output B_Output C_Output
00:00:00 15 5 8
00:00:04 16 NaN NaN
00:00:06 17 NaN 3
00:00:15 15 6 2
00:00:18 19 NaN NaN
00:00:21 14 NaN NaN
00:00:26 32 NaN 5
00:00:30 25 14 10
df3是我想要实现的。 这里df1和df2中的每个索引都有一个时间戳。 对于每个公共索引,其中 db2 不是 NaN,我们取值,否则我们保留存储在df1中的值。
df1 >>> 00:00:15 15 6 NaN
df2 >>> 00:00:15 NaN NaN 2
df3 >>> 00:00:15 15 6 2
df1 >>> 00:00:00 20 15 5
df2 >>> 00:00:00 15 5 8
df3 >>> 00:00:00 15 5 8
有关说明,请参见上述示例。 我真的找不到这样做的方法——作为参考,每个 dataframe 大约有 90 列和 100k+ 行。
先试试结合:
df3 = df2.combine_first(df1)
print(df3)
A_Output B_Output C_Output
TimeStamp
00:00:00 15.0 5.0 8.0
00:00:04 16.0 NaN NaN
00:00:06 17.0 NaN 3.0
00:00:15 15.0 6.0 2.0
00:00:18 19.0 NaN NaN
00:00:20 20.0 NaN 5.0
00:00:21 14.0 NaN NaN
00:00:26 32.0 NaN 5.0
00:00:30 25.0 14.0 10.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.