[英]Extracting the maximum time gap in a timestamp column over a certain period
I believe that my problem is really straightforward and there must be a really easy way to solve this issue, however as I am quite new with Python, specially pandas, I could not sort it out by my own. 我相信我的问题确实很简单,并且必须有一种非常简单的方法来解决此问题,但是由于我对Python(特别是熊猫)很陌生,所以我无法自己解决它。
I made up the following dataframe, which represents a way simpler scenario of what I have been working on. 我组成了以下数据框,它代表了我正在研究的一种更简单的方案。 I am looking for a way in which I can gather the maximum timestamp interval between each index for each 10 minutes. 我正在寻找一种方法,可以收集每10分钟每个索引之间的最大时间戳间隔。 I am designing a filter and hence I want to eliminate to able to visualize the maximum time difference for each 10 minutes. 我正在设计一个过滤器,因此我想消除它以可视化方式显示每10分钟的最大时间差。
Timestamp Category ... Class Speed
0 2013-08-14 22:00:00 1 ... 1 1
1 2013-08-14 22:00:01 1 ... 2 1
2 2013-08-14 22:00:05 1 ... 0 1.1
3 2013-08-14 22:00:07 1 ... 1 1.2
4 2013-08-14 22:00:14 1 ... 3 1.2
5 2013-08-14 22:00:15 1 ... 0 1.2
6 2013-08-14 22:00:16 1 ... 1 1.2
7 2013-08-14 22:00:27 1 ... 2 1.2
8 2013-08-14 22:00:38 1 ... 1 1.2
3000 2013-08-23 22:59:59 0 ... 1 2.3
I am expecting a result that resembles the following: 我期望的结果类似于以下内容:
Timestamp Max time gap
2013-08-14 22:00:00 13.416600
2013-08-14 22:10:00 14.088200
2013-08-14 22:20:00 7.187153
2013-08-14 22:30:00 16.444224
2013-08-14 22:40:00 11.780500
2013-08-14 22:50:00 12.051639
Hope that I managed to be succinct and precise. 希望我能做到简洁明了。 I would really appreciate your help on this one! 非常感谢您在此方面的帮助!
If need maximum difference for each 10 minutes of data: 如果需要每10分钟数据的最大差异:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = (df.resample('10Min', on='Timestamp')['Timestamp']
.apply(lambda x: x.diff().dt.total_seconds().max())
.reset_index(name='Max time gap'))
print (df)
Timestamp Max time gap
0 2013-08-14 22:00:00 11.0
1 2013-08-14 22:10:00 NaN
2 2013-08-14 22:20:00 NaN
3 2013-08-14 22:30:00 NaN
4 2013-08-14 22:40:00 NaN
... ...
1297 2013-08-23 22:10:00 NaN
1298 2013-08-23 22:20:00 NaN
1299 2013-08-23 22:30:00 NaN
1300 2013-08-23 22:40:00 NaN
1301 2013-08-23 22:50:00 NaN
[1302 rows x 2 columns]
Test : 测试 :
df['new'] = df.resample('10Min', on='Timestamp')['Timestamp'].diff()
print (df)
Timestamp Category Class Speed new
0 2013-08-14 22:00:00 1 1 1.0 NaT
1 2013-08-14 22:00:01 1 2 1.0 00:00:01
2 2013-08-14 22:00:05 1 0 1.1 00:00:04
3 2013-08-14 22:00:07 1 1 1.2 00:00:02
4 2013-08-14 22:00:14 1 3 1.2 00:00:07
5 2013-08-14 22:00:15 1 0 1.2 00:00:01
6 2013-08-14 22:00:16 1 1 1.2 00:00:01
7 2013-08-14 22:00:27 1 2 1.2 00:00:11
8 2013-08-14 22:00:38 1 1 1.2 00:00:11
3000 2013-08-23 22:59:59 0 1 2.3 NaT
You can resample
your data in every 10 minutes and apply an aggregate function to find the max time difference: 您可以每10分钟resample
一次数据,并应用汇总函数来找出最大时差:
df.set_index(df.Timestamp.astype('datetime64'), inplace=True)
df['Timestamp'] = df['Timestamp'].astype('datetime64')
df['Timestamp'].resample('10m').agg(lambda x: np.max(x) - np.min(x))
Input dataset: 输入数据集:
number,Timestamp,Category,Class,Speed 号,时间戳,类别,等级,速度
0 ,2013-08-14 22:00:00,1,1,1 0,2013-08-14 22:00:00,1,1,1
1 ,2013-08-14 22:00:01,1,2,1 1,2013-08-14 22:00:01,1,2,1
2 ,2013-08-14 22:00:05,1,0,1.1 2,2013-08-14 22:00:05,1,0,1.1
3 ,2013-08-14 22:00:07,1,1,1.2 3,2013-08-14 22:00:07,1,1,1.2
4 ,2013-08-14 22:00:14,1,3,1.2 4,2013-08-14 22:00:14,1,3,1.2
5 ,2013-08-14 22:00:15,1,0,1.2 5,2013-08-14 22:00:15,1,0,1.2
6 ,2013-08-14 22:00:16,1,1,1.2 6,2013-08-14 22:00:16,1,1,1.2
7 ,2013-08-14 22:00:27,1,2,1.2 7,2013-08-14 22:00:27,1,2,1.2
8 ,2013-08-14 22:00:38,1,1,1.2 8,2013-08-14 22:00:38,1,1,1.2
8 ,2013-08-14 22:40:38,1,1,1.2 8,2013-08-14 22:40:38,1,1,1.2
8 ,2013-08-14 22:45:38,1,1,1.2 8,2013-08-14 22:45:38,1,1,1.2
8 ,2013-08-14 22:49:38,1,1,1.2 8,2013-08-14 22:49:38,1,1,1.2
8 ,2013-08-14 22:50:38,1,1,1.2 8,2013-08-14 22:50:38,1,1,1.2
8 ,2013-08-14 22:52:38,1,1,1.2 8,2013-08-14 22:52:38,1,1,1.2
3000,2013-08-23 22:59:59,0,1,1 3000,2013-08-23 22:59:59,0,1,1
Process: 处理:
import pandas as pd
dataset = pd.read_csv('dataset.csv')
dataset = pd.DataFrame(dataset)
timestampField = pd.to_datetime(dataset['Timestamp'])
startDate = pd.to_datetime('2013-08-14 22:00:00')
episode = pd.Timedelta('10 minutes')
maxInterval = pd.Timedelta('0 second')
for index in range(1, len(timestampField)):
if timestampField[index] >= startDate + episode:
print(startDate, maxInterval.total_seconds())
startDate = startDate + episode
while timestampField[index] > startDate + episode:
startDate = startDate + episode
maxInterval = pd.Timedelta('0 second')
else:
localInterval = timestampField[index] - timestampField[index - 1]
if localInterval > maxInterval:
maxInterval = localInterval
Output: 输出:
2013-08-14 22:00:00 11.0 2013-08-14 22:00:00 11.0
2013-08-14 22:40:00 300.0 2013-08-14 22:40:00 300.0
2013-08-14 22:50:00 120.0 2013-08-14 22:50:00 120.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.