I have to check some data from an instrument and have to make sure that there are no missing time intervals. So for example I have a Dataframe like this:
I've tried to generate a new datetime Series with pd.date_range('2020-02-17 10:29:25', periods=1440, freq='T')
and tried to compare it. But i got stuck.
You can put Datetime
in new column but shifted one row down so you will have two values in one row and you can substract them and check if result is bigger then 1 minute.
import pandas as pd
import datetime
data = {'Datetime': [
'2020-02-17 10:29:25',
'2020-02-17 10:30:25',
'2020-02-17 10:31:25',
'2020-02-17 10:45:25',
'2020-02-17 10:46:25',
]}
df = pd.DataFrame(data)
df['Datetime'] = pd.to_datetime(df['Datetime'])
df['DT2'] = df['Datetime'].shift(1)
df['diff'] = df['Datetime'] - df['DT2']
# all values
print(df)
# compare with 1 minut
print(df[df['diff'] > datetime.timedelta(seconds=60)])
Result
# all values
Datetime DT2 diff
0 2020-02-17 10:29:25 NaT NaT
1 2020-02-17 10:30:25 2020-02-17 10:29:25 00:01:00
2 2020-02-17 10:31:25 2020-02-17 10:30:25 00:01:00
3 2020-02-17 10:45:25 2020-02-17 10:31:25 00:14:00
4 2020-02-17 10:46:25 2020-02-17 10:45:25 00:01:00
# compare with 1 minut
Datetime DT2 diff
3 2020-02-17 10:45:25 2020-02-17 10:31:25 00:14:00
EDIT: see @luigigi answer for simpler version which uses
df[ df['Datetime'].diff() > pd.Timedelta('60s') ]
You can try to create a daterange and check what values are missing in the DataFrame column. Like this:
df=pd.DataFrame({'Datetime':[*pd.date_range('2020-02-17 10:29:25', periods=3, freq='T'), *pd.date_range('2020-02-17 10:49:25', periods=3, freq='T')]})
df
Datetime
0 2020-02-17 10:29:25
1 2020-02-17 10:30:25
2 2020-02-17 10:31:25
3 2020-02-17 10:49:25
4 2020-02-17 10:50:25
5 2020-02-17 10:51:25
my_range = pd.date_range(start=df['Datetime'].min(), end=df['Datetime'].max(), freq='T')
my_range[~my_range.isin(df['Datetime'])]
DatetimeIndex(['2020-02-17 10:32:25', '2020-02-17 10:33:25',
'2020-02-17 10:34:25', '2020-02-17 10:35:25',
'2020-02-17 10:36:25', '2020-02-17 10:37:25',
'2020-02-17 10:38:25', '2020-02-17 10:39:25',
'2020-02-17 10:40:25', '2020-02-17 10:41:25',
'2020-02-17 10:42:25', '2020-02-17 10:43:25',
'2020-02-17 10:44:25', '2020-02-17 10:45:25',
'2020-02-17 10:46:25', '2020-02-17 10:47:25',
'2020-02-17 10:48:25'],
dtype='datetime64[ns]', freq='T')
Or you could use this (inspired by @furas):
df_missing = df[df['Datetime'].diff()>pd.Timedelta('60s')]
df_missing['diff'] = df.diff()
df_missing
Datetime diff
3 2020-02-17 10:49:25 00:18:00
Check that every row is -1min compared to the next one and then do something if it isn't.
for i in range(len(df.index)-1):
if datetime.datetime.strptime(df.loc[i, 'DateTime'], '%Y-%m-%d %H:%M:%S) != datetime.datetime.strptime(df.loc[i+1, 'DateTime'], '%Y-%m-%d %H:%M:%S) + datetime.timedelta(minutes=1):
print('Data missing')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.