简体   繁体   中英

How do I check a Pandas Datetime column for missing values?

I have to check some data from an instrument and have to make sure that there are no missing time intervals. So for example I have a Dataframe like this:

在此处输入图片说明

I've tried to generate a new datetime Series with pd.date_range('2020-02-17 10:29:25', periods=1440, freq='T') and tried to compare it. But i got stuck.

You can put Datetime in new column but shifted one row down so you will have two values in one row and you can substract them and check if result is bigger then 1 minute.

import pandas as pd
import datetime

data = {'Datetime': [
    '2020-02-17 10:29:25',
    '2020-02-17 10:30:25',
    '2020-02-17 10:31:25',
    '2020-02-17 10:45:25',    
    '2020-02-17 10:46:25',    
]}

df = pd.DataFrame(data)
df['Datetime'] = pd.to_datetime(df['Datetime'])

df['DT2'] = df['Datetime'].shift(1)
df['diff'] = df['Datetime'] - df['DT2']

# all values

print(df)

# compare with 1 minut

print(df[df['diff'] > datetime.timedelta(seconds=60)])

Result

# all values

             Datetime                 DT2     diff
0 2020-02-17 10:29:25                 NaT      NaT
1 2020-02-17 10:30:25 2020-02-17 10:29:25 00:01:00
2 2020-02-17 10:31:25 2020-02-17 10:30:25 00:01:00
3 2020-02-17 10:45:25 2020-02-17 10:31:25 00:14:00
4 2020-02-17 10:46:25 2020-02-17 10:45:25 00:01:00

# compare with 1 minut

             Datetime                 DT2     diff
3 2020-02-17 10:45:25 2020-02-17 10:31:25 00:14:00

EDIT: see @luigigi answer for simpler version which uses

df[ df['Datetime'].diff() > pd.Timedelta('60s') ]

You can try to create a daterange and check what values are missing in the DataFrame column. Like this:

df=pd.DataFrame({'Datetime':[*pd.date_range('2020-02-17 10:29:25', periods=3, freq='T'), *pd.date_range('2020-02-17 10:49:25', periods=3, freq='T')]})
df
             Datetime
0 2020-02-17 10:29:25
1 2020-02-17 10:30:25
2 2020-02-17 10:31:25
3 2020-02-17 10:49:25
4 2020-02-17 10:50:25
5 2020-02-17 10:51:25

my_range = pd.date_range(start=df['Datetime'].min(), end=df['Datetime'].max(), freq='T')

my_range[~my_range.isin(df['Datetime'])]
DatetimeIndex(['2020-02-17 10:32:25', '2020-02-17 10:33:25',
               '2020-02-17 10:34:25', '2020-02-17 10:35:25',
               '2020-02-17 10:36:25', '2020-02-17 10:37:25',
               '2020-02-17 10:38:25', '2020-02-17 10:39:25',
               '2020-02-17 10:40:25', '2020-02-17 10:41:25',
               '2020-02-17 10:42:25', '2020-02-17 10:43:25',
               '2020-02-17 10:44:25', '2020-02-17 10:45:25',
               '2020-02-17 10:46:25', '2020-02-17 10:47:25',
               '2020-02-17 10:48:25'],
              dtype='datetime64[ns]', freq='T')

Or you could use this (inspired by @furas):

df_missing = df[df['Datetime'].diff()>pd.Timedelta('60s')]
df_missing['diff'] = df.diff()
df_missing

             Datetime     diff
3 2020-02-17 10:49:25 00:18:00

Check that every row is -1min compared to the next one and then do something if it isn't.

for i in range(len(df.index)-1):
   if datetime.datetime.strptime(df.loc[i, 'DateTime'], '%Y-%m-%d %H:%M:%S) != datetime.datetime.strptime(df.loc[i+1, 'DateTime'], '%Y-%m-%d %H:%M:%S) + datetime.timedelta(minutes=1):
      print('Data missing')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM