繁体   English   中英

识别 pandas dataframe 中的重叠事件(日期时间记录)

[英]Identifying overlapping events (datetime records) in a pandas dataframe

我很难在我的数据集中检测重叠的start_datetimeend_datetime

目前我的数据集如下所示

在此处输入图像描述

但我想去

在此处输入图像描述

生成数据集的原始代码

import pandas as pd
df = pd.DataFrame({
    'start_datetime':[
        '2000-01-01 02:23:49', '1997-12-20 07:22:10', '2000-01-05 03:42:29', '2002-02-25 17:20:09', '1999-06-30 03:33:20',
    ],
    'end_datetime':[
        '2000-01-06 04:50:20', '1998-12-20 01:24:12', '2000-03-01 11:01:11', '2003-02-25 22:05:02', '2000-01-01 02:50:30',
    ],
    
})
df['start_datetime'] = pd.to_datetime(df['start_datetime'])
df['end_datetime'] = pd.to_datetime(df['end_datetime'])
df

有没有办法(有效或低效)检测重叠而不对列进行排序?

Numpy broadcasting

s, e = df[['start_datetime', 'end_datetime']].to_numpy().T
m1 = (s[:, None] > s) & (s[:, None] < e) # Check if start time overlap
m2 = (e[:, None] < e) & (e[:, None] > s) # Check if ending time overlap

df['overlap'] = (m1 | m2).any(1)

结果

>>> df

       start_datetime        end_datetime  overlap
0 2000-01-01 02:23:49 2000-01-06 04:50:20     True
1 1997-12-20 07:22:10 1998-12-20 01:24:12    False
2 2000-01-05 03:42:29 2000-03-01 11:01:11     True
3 2002-02-25 17:20:09 2003-02-25 22:05:02    False
4 1999-06-30 03:33:20 2000-01-01 02:50:30     True

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM