[英]How do I check a Pandas Datetime column for missing values?
I have to check some data from an instrument and have to make sure that there are no missing time intervals.我必须检查仪器的一些数据,并确保没有丢失时间间隔。 So for example I have a Dataframe like this:
例如,我有一个像这样的数据框:
I've tried to generate a new datetime Series with pd.date_range('2020-02-17 10:29:25', periods=1440, freq='T')
and tried to compare it.我试图用
pd.date_range('2020-02-17 10:29:25', periods=1440, freq='T')
生成一个新的日期时间系列并尝试比较它。 But i got stuck.但我被卡住了。
You can put Datetime
in new column but shifted one row down so you will have two values in one row and you can substract them and check if result is bigger then 1 minute.您可以将
Datetime
放在新列中,但向下移动一行,这样您将在一行中有两个值,您可以将它们相减并检查结果是否大于 1 分钟。
import pandas as pd
import datetime
data = {'Datetime': [
'2020-02-17 10:29:25',
'2020-02-17 10:30:25',
'2020-02-17 10:31:25',
'2020-02-17 10:45:25',
'2020-02-17 10:46:25',
]}
df = pd.DataFrame(data)
df['Datetime'] = pd.to_datetime(df['Datetime'])
df['DT2'] = df['Datetime'].shift(1)
df['diff'] = df['Datetime'] - df['DT2']
# all values
print(df)
# compare with 1 minut
print(df[df['diff'] > datetime.timedelta(seconds=60)])
Result结果
# all values
Datetime DT2 diff
0 2020-02-17 10:29:25 NaT NaT
1 2020-02-17 10:30:25 2020-02-17 10:29:25 00:01:00
2 2020-02-17 10:31:25 2020-02-17 10:30:25 00:01:00
3 2020-02-17 10:45:25 2020-02-17 10:31:25 00:14:00
4 2020-02-17 10:46:25 2020-02-17 10:45:25 00:01:00
# compare with 1 minut
Datetime DT2 diff
3 2020-02-17 10:45:25 2020-02-17 10:31:25 00:14:00
EDIT: see @luigigi answer for simpler version which uses编辑:有关使用的更简单版本,请参阅@luigigi 答案
df[ df['Datetime'].diff() > pd.Timedelta('60s') ]
You can try to create a daterange and check what values are missing in the DataFrame column.您可以尝试创建一个日期范围并检查 DataFrame 列中缺少哪些值。 Like this:
像这样:
df=pd.DataFrame({'Datetime':[*pd.date_range('2020-02-17 10:29:25', periods=3, freq='T'), *pd.date_range('2020-02-17 10:49:25', periods=3, freq='T')]})
df
Datetime
0 2020-02-17 10:29:25
1 2020-02-17 10:30:25
2 2020-02-17 10:31:25
3 2020-02-17 10:49:25
4 2020-02-17 10:50:25
5 2020-02-17 10:51:25
my_range = pd.date_range(start=df['Datetime'].min(), end=df['Datetime'].max(), freq='T')
my_range[~my_range.isin(df['Datetime'])]
DatetimeIndex(['2020-02-17 10:32:25', '2020-02-17 10:33:25',
'2020-02-17 10:34:25', '2020-02-17 10:35:25',
'2020-02-17 10:36:25', '2020-02-17 10:37:25',
'2020-02-17 10:38:25', '2020-02-17 10:39:25',
'2020-02-17 10:40:25', '2020-02-17 10:41:25',
'2020-02-17 10:42:25', '2020-02-17 10:43:25',
'2020-02-17 10:44:25', '2020-02-17 10:45:25',
'2020-02-17 10:46:25', '2020-02-17 10:47:25',
'2020-02-17 10:48:25'],
dtype='datetime64[ns]', freq='T')
Or you could use this (inspired by @furas):或者你可以使用这个(灵感来自@furas):
df_missing = df[df['Datetime'].diff()>pd.Timedelta('60s')]
df_missing['diff'] = df.diff()
df_missing
Datetime diff
3 2020-02-17 10:49:25 00:18:00
Check that every row is -1min compared to the next one and then do something if it isn't.检查每一行与下一行相比是否为 -1 分钟,如果不是,则执行某些操作。
for i in range(len(df.index)-1):
if datetime.datetime.strptime(df.loc[i, 'DateTime'], '%Y-%m-%d %H:%M:%S) != datetime.datetime.strptime(df.loc[i+1, 'DateTime'], '%Y-%m-%d %H:%M:%S) + datetime.timedelta(minutes=1):
print('Data missing')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.