简体   繁体   English

pandas:删除另一系列时间索引的时间间隔内的所有行(即时间范围排除)

[英]pandas: Remove all rows within time interval of another series's time index (i.e. time range exclusion)

Suppose I have two dataframes: 假设我有两个数据帧:

#df1
time
2016-09-12 13:00:00.017    1.0
2016-09-12 13:00:03.233    1.0
2016-09-12 13:00:10.256    1.0
2016-09-12 13:00:19.605    1.0

#df2
time
2016-09-12 13:00:00.017    1.0
2016-09-12 13:00:00.233    0.0
2016-09-12 13:00:01.016    1.0
2016-09-12 13:00:01.505    0.0
2016-09-12 13:00:06.017    1.0
2016-09-12 13:00:07.233    0.0
2016-09-12 13:00:08.256    1.0
2016-09-12 13:00:19.705    0.0

I want to remove all rows in df2 that are up to +1 second of the time indices in df1 , so yielding: 我想删除df2df1时间指数高达+1秒的所有行,因此产生:

#result
time
2016-09-12 13:00:01.505    0.0
2016-09-12 13:00:06.017    1.0
2016-09-12 13:00:07.233    0.0
2016-09-12 13:00:08.256    1.0

What's the most efficient way to do this? 最有效的方法是什么? I don't see anything useful for time range exclusions in the API. 我认为API中的时间范围排除没有任何用处。

You can use pd.merge_asof which is a new inclusion starting with 0.19.0 and also accepts a tolerance argument to match +/- that specified amount of time interval. 您可以使用pd.merge_asof这是一个以0.19.0开头的新包含,并且还接受容差参数以匹配+/-指定的时间间隔量。

# Assuming time to be set as the index axis for both df's
df1.reset_index(inplace=True)
df2.reset_index(inplace=True)

df2.loc[pd.merge_asof(df2, df1, on='time', tolerance=pd.Timedelta('1s')).isnull().any(1)]

在此输入图像描述

Note that default matching is carried out in the backwards direction , which means that selection occurs at the last row in the right DataFrame ( df1 ) whose "on" key (which is "time" ) is less than or equal to the left's ( df2 ) key. 请注意,默认匹配是在向后方向上执行的 ,这意味着选择发生在右侧DataFrame( df1 )的最后一行,其"on"键(即"time" )小于或等于left( df2 )钥匙。 Hence, the tolerance parameter extends only in this direction ( backward ) resulting in a - range of matching. 因此, tolerance参数仅在此方向( 向后 ),产生一个延伸-范围匹配的。

To have both forward as well as backward lookups possible, starting with 0.20.0 this can be achieved by making use of direction='nearest' argument and including it in the function call. 要使正向反向查找成为可能,从0.20.0开始,这可以通过使用direction='nearest'参数并将其包含在函数调用中来实现。 Due to this, the tolerance also gets extended both ways resulting in a +/- bandwidth range of matching. 因此, tolerance也会以两种方式扩展,从而产生+/-带宽匹配范围。

Similar idea as @Nickil Maveli, but using reindex to build a Boolean indexer: 与@Nickil Maveli类似的想法,但使用reindex来构建布尔索引器:

df2 = df2[df1.reindex(df2.index, method='nearest', tolerance=pd.Timedelta('1s')).isnull()]

The resulting output: 结果输出:

time
2016-09-12 13:00:01.505    0.0
2016-09-12 13:00:06.017    1.0
2016-09-12 13:00:07.233    0.0
2016-09-12 13:00:08.256    1.0

One way to do it would be to lookup via time indexing (assuming both time columns are indices): 一种方法是通过时间索引进行查找(假设两个时间列都是索引):

td = pd.to_timedelta(1, unit='s')
df2.apply(lambda row: df1[row.name - td:row.name].size > 0, axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM