pandas：删除另一系列时间索引的时间间隔内的所有行（即时间范围排除）

Question

Suppose I have two dataframes: 假设我有两个数据帧：

#df1
time
2016-09-12 13:00:00.017    1.0
2016-09-12 13:00:03.233    1.0
2016-09-12 13:00:10.256    1.0
2016-09-12 13:00:19.605    1.0

#df2
time
2016-09-12 13:00:00.017    1.0
2016-09-12 13:00:00.233    0.0
2016-09-12 13:00:01.016    1.0
2016-09-12 13:00:01.505    0.0
2016-09-12 13:00:06.017    1.0
2016-09-12 13:00:07.233    0.0
2016-09-12 13:00:08.256    1.0
2016-09-12 13:00:19.705    0.0

I want to remove all rows in df2 that are up to +1 second of the time indices in df1 , so yielding: 我想删除df2中df1时间指数高达+1秒的所有行，因此产生：

#result
time
2016-09-12 13:00:01.505    0.0
2016-09-12 13:00:06.017    1.0
2016-09-12 13:00:07.233    0.0
2016-09-12 13:00:08.256    1.0

What's the most efficient way to do this? 最有效的方法是什么？ I don't see anything useful for time range exclusions in the API. 我认为API中的时间范围排除没有任何用处。

Answer 1

You can use pd.merge_asof which is a new inclusion starting with 0.19.0 and also accepts a tolerance argument to match +/- that specified amount of time interval. 您可以使用pd.merge_asof这是一个以0.19.0开头的新包含，并且还接受容差参数以匹配+/-指定的时间间隔量。

# Assuming time to be set as the index axis for both df's
df1.reset_index(inplace=True)
df2.reset_index(inplace=True)

df2.loc[pd.merge_asof(df2, df1, on='time', tolerance=pd.Timedelta('1s')).isnull().any(1)]

Note that default matching is carried out in the backwards direction , which means that selection occurs at the last row in the right DataFrame ( df1 ) whose "on" key (which is "time" ) is less than or equal to the left's ( df2 ) key. 请注意，默认匹配是在向后方向上执行的 ，这意味着选择发生在右侧DataFrame（ df1 ）的最后一行，其"on"键（即"time" ）小于或等于left（ df2 ）钥匙。 Hence, the tolerance parameter extends only in this direction ( backward ) resulting in a - range of matching. 因此， tolerance参数仅在此方向（向后），产生一个延伸-范围匹配的。

To have both forward as well as backward lookups possible, starting with 0.20.0 this can be achieved by making use of direction='nearest' argument and including it in the function call. 要使正向和反向查找成为可能，从0.20.0开始，这可以通过使用direction='nearest'参数并将其包含在函数调用中来实现。 Due to this, the tolerance also gets extended both ways resulting in a +/- bandwidth range of matching. 因此， tolerance也会以两种方式扩展，从而产生+/-带宽匹配范围。

Answer 2

Similar idea as @Nickil Maveli, but using reindex to build a Boolean indexer: 与@Nickil Maveli类似的想法，但使用reindex来构建布尔索引器：

df2 = df2[df1.reindex(df2.index, method='nearest', tolerance=pd.Timedelta('1s')).isnull()]

The resulting output: 结果输出：

time
2016-09-12 13:00:01.505    0.0
2016-09-12 13:00:06.017    1.0
2016-09-12 13:00:07.233    0.0
2016-09-12 13:00:08.256    1.0

Answer 3

One way to do it would be to lookup via time indexing (assuming both time columns are indices): 一种方法是通过时间索引进行查找（假设两个时间列都是索引）：

td = pd.to_timedelta(1, unit='s')
df2.apply(lambda row: df1[row.name - td:row.name].size > 0, axis=1)

pandas：删除另一系列时间索引的时间间隔内的所有行（即时间范围排除）

问题描述

3 个解决方案

解决方案1
11 已采纳 2016-11-09 17:40:13

解决方案2
4 2016-11-09 17:50:25

解决方案3
1 2016-11-09 17:46:24

pandas：删除另一系列时间索引的时间间隔内的所有行（即时间范围排除）

问题描述

3 个解决方案

解决方案1 11 已采纳 2016-11-09 17:40:13

解决方案2 4 2016-11-09 17:50:25

解决方案3 1 2016-11-09 17:46:24

解决方案1
11 已采纳 2016-11-09 17:40:13

解决方案2
4 2016-11-09 17:50:25

解决方案3
1 2016-11-09 17:46:24