[英]Pandas dataframes - join on similar timestamps
我有2个数据框
small_df =
time_early
0, 18:19:20.877154
1, 20:34:24.738802
和large_df
,还有更多行
time_late
0, 11:12:23.879154
1, 11:12:23.879154
2, 18:19:20.879154
3, 19:01:20.877154
4, 20:34:24.748802
我想以这样的方式加入它们:将small_df
中的每一行都连接到紧随其后的large_df
中的一行,以便所需的结果看起来像
time_early time_late
0, 18:19:20.877154 18:19:20.879154
1, 20:34:24.738802 20:34:24.748802
另外,假设这2个数据框可能还有其他列,我希望在最终结果中保留这些列。 我该如何实现? 我知道我需要某种合并,但不确定。
def join_closest_time(df):
# first of all get values that is greater than time_early for each row
time_greater = large_df.time_late > df['time_early']
# subset data to get only the first one , this should be the closest one
# to time early if time_late columns is sorted in ascending order
close_date = large_df[time_greater].iloc[0]
# then concatenate rows from both data frames
df_final = pd.concat([df , close_date])
return df_final
small_df.apply(join_closest_time, axis = 1)
Out[116]:
time_early time_late
0 18:19:20.877154 18:19:20.879154
1 20:34:24.738802 20:34:24.748802
如果您的large_df
排序,则time_late
升序对其进行排序
large_df.sort_index(by = 'time_late' , inplace=True)
如果在特定的time_early
值之后有任何time_late
,则取第一个值。 否则,请使用None
。
small_df['time_late'] = \
small_df.time_early.apply(lambda time: large_df[large_df.time_late > time].values[0][0]
if large_df.time_late.gt(time).any() else None)
>>> small_df
time_early time_late
0 18:19:20.877154 18:19:20.879154
1 20:34:24.738802 20:34:24.748802
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.