简体   繁体   English

如何检查 Pandas 日期时间列是否存在缺失值?

[英]How do I check a Pandas Datetime column for missing values?

I have to check some data from an instrument and have to make sure that there are no missing time intervals.我必须检查仪器的一些数据,并确保没有丢失时间间隔。 So for example I have a Dataframe like this:例如,我有一个像这样的数据框:

在此处输入图片说明

I've tried to generate a new datetime Series with pd.date_range('2020-02-17 10:29:25', periods=1440, freq='T') and tried to compare it.我试图用pd.date_range('2020-02-17 10:29:25', periods=1440, freq='T')生成一个新的日期时间系列并尝试比较它。 But i got stuck.但我被卡住了。

You can put Datetime in new column but shifted one row down so you will have two values in one row and you can substract them and check if result is bigger then 1 minute.您可以将Datetime放在新列中,但向下移动一行,这样您将在一行中有两个值,您可以将它们相减并检查结果是否大于 1 分钟。

import pandas as pd
import datetime

data = {'Datetime': [
    '2020-02-17 10:29:25',
    '2020-02-17 10:30:25',
    '2020-02-17 10:31:25',
    '2020-02-17 10:45:25',    
    '2020-02-17 10:46:25',    
]}

df = pd.DataFrame(data)
df['Datetime'] = pd.to_datetime(df['Datetime'])

df['DT2'] = df['Datetime'].shift(1)
df['diff'] = df['Datetime'] - df['DT2']

# all values

print(df)

# compare with 1 minut

print(df[df['diff'] > datetime.timedelta(seconds=60)])

Result结果

# all values

             Datetime                 DT2     diff
0 2020-02-17 10:29:25                 NaT      NaT
1 2020-02-17 10:30:25 2020-02-17 10:29:25 00:01:00
2 2020-02-17 10:31:25 2020-02-17 10:30:25 00:01:00
3 2020-02-17 10:45:25 2020-02-17 10:31:25 00:14:00
4 2020-02-17 10:46:25 2020-02-17 10:45:25 00:01:00

# compare with 1 minut

             Datetime                 DT2     diff
3 2020-02-17 10:45:25 2020-02-17 10:31:25 00:14:00

EDIT: see @luigigi answer for simpler version which uses编辑:有关使用的更简单版本,请参阅@luigigi 答案

df[ df['Datetime'].diff() > pd.Timedelta('60s') ]

You can try to create a daterange and check what values are missing in the DataFrame column.您可以尝试创建一个日期范围并检查 DataFrame 列中缺少哪些值。 Like this:像这样:

df=pd.DataFrame({'Datetime':[*pd.date_range('2020-02-17 10:29:25', periods=3, freq='T'), *pd.date_range('2020-02-17 10:49:25', periods=3, freq='T')]})
df
             Datetime
0 2020-02-17 10:29:25
1 2020-02-17 10:30:25
2 2020-02-17 10:31:25
3 2020-02-17 10:49:25
4 2020-02-17 10:50:25
5 2020-02-17 10:51:25

my_range = pd.date_range(start=df['Datetime'].min(), end=df['Datetime'].max(), freq='T')

my_range[~my_range.isin(df['Datetime'])]
DatetimeIndex(['2020-02-17 10:32:25', '2020-02-17 10:33:25',
               '2020-02-17 10:34:25', '2020-02-17 10:35:25',
               '2020-02-17 10:36:25', '2020-02-17 10:37:25',
               '2020-02-17 10:38:25', '2020-02-17 10:39:25',
               '2020-02-17 10:40:25', '2020-02-17 10:41:25',
               '2020-02-17 10:42:25', '2020-02-17 10:43:25',
               '2020-02-17 10:44:25', '2020-02-17 10:45:25',
               '2020-02-17 10:46:25', '2020-02-17 10:47:25',
               '2020-02-17 10:48:25'],
              dtype='datetime64[ns]', freq='T')

Or you could use this (inspired by @furas):或者你可以使用这个(灵感来自@furas):

df_missing = df[df['Datetime'].diff()>pd.Timedelta('60s')]
df_missing['diff'] = df.diff()
df_missing

             Datetime     diff
3 2020-02-17 10:49:25 00:18:00

Check that every row is -1min compared to the next one and then do something if it isn't.检查每一行与下一行相比是否为 -1 分钟,如果不是,则执行某些操作。

for i in range(len(df.index)-1):
   if datetime.datetime.strptime(df.loc[i, 'DateTime'], '%Y-%m-%d %H:%M:%S) != datetime.datetime.strptime(df.loc[i+1, 'DateTime'], '%Y-%m-%d %H:%M:%S) + datetime.timedelta(minutes=1):
      print('Data missing')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM