[英]How do I check if all values in a column of a pandas dataframe are equal?
[英]How do I check a Pandas Datetime column for missing values?
您可以將Datetime
放在新列中,但向下移動一行,這樣您將在一行中有兩個值,您可以將它們相減並檢查結果是否大於 1 分鍾。
import pandas as pd
import datetime
data = {'Datetime': [
'2020-02-17 10:29:25',
'2020-02-17 10:30:25',
'2020-02-17 10:31:25',
'2020-02-17 10:45:25',
'2020-02-17 10:46:25',
]}
df = pd.DataFrame(data)
df['Datetime'] = pd.to_datetime(df['Datetime'])
df['DT2'] = df['Datetime'].shift(1)
df['diff'] = df['Datetime'] - df['DT2']
# all values
print(df)
# compare with 1 minut
print(df[df['diff'] > datetime.timedelta(seconds=60)])
結果
# all values
Datetime DT2 diff
0 2020-02-17 10:29:25 NaT NaT
1 2020-02-17 10:30:25 2020-02-17 10:29:25 00:01:00
2 2020-02-17 10:31:25 2020-02-17 10:30:25 00:01:00
3 2020-02-17 10:45:25 2020-02-17 10:31:25 00:14:00
4 2020-02-17 10:46:25 2020-02-17 10:45:25 00:01:00
# compare with 1 minut
Datetime DT2 diff
3 2020-02-17 10:45:25 2020-02-17 10:31:25 00:14:00
編輯:有關使用的更簡單版本,請參閱@luigigi 答案
df[ df['Datetime'].diff() > pd.Timedelta('60s') ]
您可以嘗試創建一個日期范圍並檢查 DataFrame 列中缺少哪些值。 像這樣:
df=pd.DataFrame({'Datetime':[*pd.date_range('2020-02-17 10:29:25', periods=3, freq='T'), *pd.date_range('2020-02-17 10:49:25', periods=3, freq='T')]})
df
Datetime
0 2020-02-17 10:29:25
1 2020-02-17 10:30:25
2 2020-02-17 10:31:25
3 2020-02-17 10:49:25
4 2020-02-17 10:50:25
5 2020-02-17 10:51:25
my_range = pd.date_range(start=df['Datetime'].min(), end=df['Datetime'].max(), freq='T')
my_range[~my_range.isin(df['Datetime'])]
DatetimeIndex(['2020-02-17 10:32:25', '2020-02-17 10:33:25',
'2020-02-17 10:34:25', '2020-02-17 10:35:25',
'2020-02-17 10:36:25', '2020-02-17 10:37:25',
'2020-02-17 10:38:25', '2020-02-17 10:39:25',
'2020-02-17 10:40:25', '2020-02-17 10:41:25',
'2020-02-17 10:42:25', '2020-02-17 10:43:25',
'2020-02-17 10:44:25', '2020-02-17 10:45:25',
'2020-02-17 10:46:25', '2020-02-17 10:47:25',
'2020-02-17 10:48:25'],
dtype='datetime64[ns]', freq='T')
或者你可以使用這個(靈感來自@furas):
df_missing = df[df['Datetime'].diff()>pd.Timedelta('60s')]
df_missing['diff'] = df.diff()
df_missing
Datetime diff
3 2020-02-17 10:49:25 00:18:00
檢查每一行與下一行相比是否為 -1 分鍾,如果不是,則執行某些操作。
for i in range(len(df.index)-1):
if datetime.datetime.strptime(df.loc[i, 'DateTime'], '%Y-%m-%d %H:%M:%S) != datetime.datetime.strptime(df.loc[i+1, 'DateTime'], '%Y-%m-%d %H:%M:%S) + datetime.timedelta(minutes=1):
print('Data missing')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.