简体   繁体   English

连接具有相同列的两个熊猫数据框并合并具有相同索引的行

[英]Joining two panda dataframes with same columns and merging rows with same index

I have two dataframes df1 and df2 each with the same column names using timestamps as indicies.我有两个数据帧df1df2 ,每个数据帧都具有相同的列名,使用时间戳作为索引。 I want to concatenate the two dataframes whilst merging rows with the same index choosing the values stored in df2 as preference.我想连接两个数据帧,同时合并具有相同索引的行,选择存储在df2中的值作为首选项。 This is poorly worded but see below.这措辞不好,但见下文。 Eg例如

>>> df1= TimeStamp A_Output B_Output C_Output
          00:00:00  20       15       5
          00:00:06  20       NaN      3
          00:00:15  15       6      NaN
          00:00:20  20       NaN      5
          00:00:30  25       14      10


 >>> df2= TimeStamp A_Output B_Output C_Output
          00:00:00  15       5        8
          00:00:04  16       NaN      NaN
          00:00:06  17       NaN      NaN
          00:00:15  NaN      NaN      2
          00:00:18  19       NaN      NaN
          00:00:21  14       NaN      NaN
          00:00:26  32       NaN      5
          

 >>> df3= TimeStamp A_Output B_Output C_Output
          00:00:00  15       5        8
          00:00:04  16       NaN      NaN
          00:00:06  17       NaN      3
          00:00:15  15       6        2
          00:00:18  19       NaN      NaN
          00:00:21  14       NaN      NaN
          00:00:26  32       NaN      5
          00:00:30  25       14      10

df3 is what I would like to achieve. df3是我想要实现的。 Here there is a timestamp for every index in df1 and df2 .这里df1df2中的每个索引都有一个时间戳。 For each common index, where db2 is not NaN, we take the values, otherwise we preserve those stored in df1 .对于每个公共索引,其中 db2 不是 NaN,我们取值,否则我们保留存储在df1中的值。

df1 >>> 00:00:15  15        6     NaN
df2 >>> 00:00:15  NaN      NaN     2
df3 >>> 00:00:15  15        6      2

df1 >>> 00:00:00  20        15     5
df2 >>> 00:00:00  15         5     8
df3 >>> 00:00:00  15         5     8

For clarification see the above examples.有关说明,请参见上述示例。 I really can't find a way to do this -- for reference each dataframe has around 90 columns and 100k+ rows.我真的找不到这样做的方法——作为参考,每个 dataframe 大约有 90 列和 100k+ 行。

Try combine first:先试试结合:

df3 = df2.combine_first(df1)

print(df3)

           A_Output  B_Output  C_Output
TimeStamp                              
00:00:00       15.0       5.0       8.0
00:00:04       16.0       NaN       NaN
00:00:06       17.0       NaN       3.0
00:00:15       15.0       6.0       2.0
00:00:18       19.0       NaN       NaN
00:00:20       20.0       NaN       5.0
00:00:21       14.0       NaN       NaN
00:00:26       32.0       NaN       5.0
00:00:30       25.0      14.0      10.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM