繁体   English   中英

熊猫数据框-加入类似的时间戳

[英]Pandas dataframes - join on similar timestamps

我有2个数据框

small_df = 
   time_early            
0, 18:19:20.877154
1, 20:34:24.738802

large_df ,还有更多行

   time_late      
0, 11:12:23.879154
1, 11:12:23.879154            
2, 18:19:20.879154
3, 19:01:20.877154
4, 20:34:24.748802

我想以这样的方式加入它们:将small_df中的每一行都连接到紧随其后的large_df中的一行,以便所需的结果看起来像

   time_early           time_late 
0, 18:19:20.877154      18:19:20.879154
1, 20:34:24.738802      20:34:24.748802

另外,假设这2个数据框可能还有其他列,我希望在最终结果中保留这些列。 我该如何实现? 我知道我需要某种合并,但不确定。

def join_closest_time(df):
    # first of all get values that is greater than time_early for each row
    time_greater = large_df.time_late > df['time_early']
    # subset data to get only the first one , this should be the closest one
    # to time early if time_late columns is sorted in ascending order
    close_date = large_df[time_greater].iloc[0]
    # then concatenate rows from both data frames
    df_final = pd.concat([df , close_date])
    return df_final

small_df.apply(join_closest_time, axis = 1)


Out[116]:
    time_early          time_late
0   18:19:20.877154 18:19:20.879154
1   20:34:24.738802 20:34:24.748802

如果您的large_df排序,则time_late升序对其进行排序

large_df.sort_index(by = 'time_late' , inplace=True)

如果在特定的time_early值之后有任何time_late ,则取第一个值。 否则,请使用None

small_df['time_late'] = \
    small_df.time_early.apply(lambda time: large_df[large_df.time_late > time].values[0][0]        
                                           if large_df.time_late.gt(time).any() else None)

>>> small_df
        time_early        time_late
0  18:19:20.877154  18:19:20.879154
1  20:34:24.738802  20:34:24.748802

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM