简体   繁体   English

提取特定时间段内时间戳列中的最大时间间隔

[英]Extracting the maximum time gap in a timestamp column over a certain period

I believe that my problem is really straightforward and there must be a really easy way to solve this issue, however as I am quite new with Python, specially pandas, I could not sort it out by my own. 我相信我的问题确实很简单,并且必须有一种非常简单的方法来解决此问题,但是由于我对Python(特别是熊猫)很陌生,所以我无法自己解决它。

I made up the following dataframe, which represents a way simpler scenario of what I have been working on. 我组成了以下数据框,它代表了我正在研究的一种更简单的方案。 I am looking for a way in which I can gather the maximum timestamp interval between each index for each 10 minutes. 我正在寻找一种方法,可以收集每10分钟每个索引之间的最大时间戳间隔。 I am designing a filter and hence I want to eliminate to able to visualize the maximum time difference for each 10 minutes. 我正在设计一个过滤器,因此我想消除它以可视化方式显示每10分钟的最大时间差。

            Timestamp      Category  ...       Class           Speed
0     2013-08-14 22:00:00         1  ...          1               1
1     2013-08-14 22:00:01         1  ...          2               1
2     2013-08-14 22:00:05         1  ...          0               1.1
3     2013-08-14 22:00:07         1  ...          1               1.2
4     2013-08-14 22:00:14         1  ...          3               1.2
5     2013-08-14 22:00:15         1  ...          0               1.2
6     2013-08-14 22:00:16         1  ...          1               1.2
7     2013-08-14 22:00:27         1  ...          2               1.2
8     2013-08-14 22:00:38         1  ...          1               1.2

3000  2013-08-23 22:59:59         0  ...          1               2.3

I am expecting a result that resembles the following: 我期望的结果类似于以下内容:

     Timestamp       Max time gap                                            
2013-08-14 22:00:00    13.416600 
2013-08-14 22:10:00    14.088200    
2013-08-14 22:20:00    7.187153    
2013-08-14 22:30:00    16.444224      
2013-08-14 22:40:00    11.780500        
2013-08-14 22:50:00    12.051639        

Hope that I managed to be succinct and precise. 希望我能做到简洁明了。 I would really appreciate your help on this one! 非常感谢您在此方面的帮助!

If need maximum difference for each 10 minutes of data: 如果需要每10分钟数据的最大差异:

df['Timestamp'] = pd.to_datetime(df['Timestamp'])

df = (df.resample('10Min', on='Timestamp')['Timestamp']
        .apply(lambda x: x.diff().dt.total_seconds().max())
        .reset_index(name='Max time gap'))

print (df)
               Timestamp  Max time gap
0    2013-08-14 22:00:00          11.0
1    2013-08-14 22:10:00           NaN
2    2013-08-14 22:20:00           NaN
3    2013-08-14 22:30:00           NaN
4    2013-08-14 22:40:00           NaN
                 ...           ...
1297 2013-08-23 22:10:00           NaN
1298 2013-08-23 22:20:00           NaN
1299 2013-08-23 22:30:00           NaN
1300 2013-08-23 22:40:00           NaN
1301 2013-08-23 22:50:00           NaN

[1302 rows x 2 columns]

Test : 测试

df['new'] = df.resample('10Min', on='Timestamp')['Timestamp'].diff()
print (df)
               Timestamp  Category  Class  Speed      new
0    2013-08-14 22:00:00         1      1    1.0      NaT
1    2013-08-14 22:00:01         1      2    1.0 00:00:01
2    2013-08-14 22:00:05         1      0    1.1 00:00:04
3    2013-08-14 22:00:07         1      1    1.2 00:00:02
4    2013-08-14 22:00:14         1      3    1.2 00:00:07
5    2013-08-14 22:00:15         1      0    1.2 00:00:01
6    2013-08-14 22:00:16         1      1    1.2 00:00:01
7    2013-08-14 22:00:27         1      2    1.2 00:00:11
8    2013-08-14 22:00:38         1      1    1.2 00:00:11
3000 2013-08-23 22:59:59         0      1    2.3      NaT

You can resample your data in every 10 minutes and apply an aggregate function to find the max time difference: 您可以每10分钟resample一次数据,并应用汇总函数来找出最大时差:

df.set_index(df.Timestamp.astype('datetime64'), inplace=True)
df['Timestamp'] = df['Timestamp'].astype('datetime64')

df['Timestamp'].resample('10m').agg(lambda x: np.max(x) - np.min(x))

Input dataset: 输入数据集:

number,Timestamp,Category,Class,Speed 号,时间戳,类别,等级,速度
0 ,2013-08-14 22:00:00,1,1,1 0,2013-08-14 22:00:00,1,1,1
1 ,2013-08-14 22:00:01,1,2,1 1,2013-08-14 22:00:01,1,2,1
2 ,2013-08-14 22:00:05,1,0,1.1 2,2013-08-14 22:00:05,1,0,1.1
3 ,2013-08-14 22:00:07,1,1,1.2 3,2013-08-14 22:00:07,1,1,1.2
4 ,2013-08-14 22:00:14,1,3,1.2 4,2013-08-14 22:00:14,1,3,1.2
5 ,2013-08-14 22:00:15,1,0,1.2 5,2013-08-14 22:00:15,1,0,1.2
6 ,2013-08-14 22:00:16,1,1,1.2 6,2013-08-14 22:00:16,1,1,1.2
7 ,2013-08-14 22:00:27,1,2,1.2 7,2013-08-14 22:00:27,1,2,1.2
8 ,2013-08-14 22:00:38,1,1,1.2 8,2013-08-14 22:00:38,1,1,1.2
8 ,2013-08-14 22:40:38,1,1,1.2 8,2013-08-14 22:40:38,1,1,1.2
8 ,2013-08-14 22:45:38,1,1,1.2 8,2013-08-14 22:45:38,1,1,1.2
8 ,2013-08-14 22:49:38,1,1,1.2 8,2013-08-14 22:49:38,1,1,1.2
8 ,2013-08-14 22:50:38,1,1,1.2 8,2013-08-14 22:50:38,1,1,1.2
8 ,2013-08-14 22:52:38,1,1,1.2 8,2013-08-14 22:52:38,1,1,1.2
3000,2013-08-23 22:59:59,0,1,1 3000,2013-08-23 22:59:59,0,1,1
Process: 处理:

import pandas as pd
dataset = pd.read_csv('dataset.csv')  
dataset = pd.DataFrame(dataset)  
timestampField = pd.to_datetime(dataset['Timestamp'])  
startDate = pd.to_datetime('2013-08-14 22:00:00')  
episode = pd.Timedelta('10 minutes')  
maxInterval = pd.Timedelta('0 second')  
for index in range(1, len(timestampField)):  
  if timestampField[index] >= startDate + episode:  
      print(startDate, maxInterval.total_seconds())  
      startDate = startDate + episode  
      while timestampField[index] > startDate + episode:     
          startDate = startDate + episode
      maxInterval = pd.Timedelta('0 second')  
else:  
  localInterval = timestampField[index] -  timestampField[index - 1]  
  if localInterval > maxInterval:  
        maxInterval = localInterval

Output: 输出:
2013-08-14 22:00:00 11.0 2013-08-14 22:00:00 11.0
2013-08-14 22:40:00 300.0 2013-08-14 22:40:00 300.0
2013-08-14 22:50:00 120.0 2013-08-14 22:50:00 120.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM