简体   繁体   中英

Checking if there is date skip in data

I have a dataset and its indexes consist of timestamps. It's pandas series just like below:

Time                           
2013-09-17 22:08:11           0
2013-09-17 22:08:18           0
2013-09-17 22:08:26           0
2013-09-17 22:08:34           0
2013-09-17 22:08:42           0
2013-09-17 22:08:50           0
2013-09-17 22:08:58           0
2013-09-17 22:09:06           0
2013-09-17 22:09:11           0
2013-09-17 22:09:13           0
2013-09-17 22:09:19           0
2013-09-17 22:09:21           0
2013-09-17 22:09:27           0
2013-09-17 22:09:35           0
2013-09-17 22:09:43           0
Name: dummy_frame, dtype: float64

Data are recorded irregularly regarding to timestamps. Now what I want to do is to check this data, if there is date skip or jump inside it, such as from 2013-09-07 to 2013-12-22. I can do it simply with check first and last date and compare them relatively. However, I need to find where this jump occurs. Is there any easy way to find it out?

Thank you.

IIUC:

x = #your series
x.index = pd.to_datetime(x.index)
jumps = x.index.dt.date - x.index.shift(1).dt.date

This will create a series where jump[i] is the difference between jump[i] and jump[i-1] if you want to find where jump>1, just do:

x[jump>1]

I believe you could simply create a data range with the same date format and compare both lists:

from datetime import datetime,timedelta

start_date = datetime.strptime("2013-09-07","%Y-%m-%d")
end_date = datetime.strptime("2013-12-22","%Y-%m-%d")

# This will create a list with complete dates
completeDates = [start_date + timedelta(days=x) for x in range(0,(end_dat-start_date ).days + 1)]
completeDates = [d.strftime("%Y-%m-%d") for d in completeDates] # Convert date to string

# Get your list from data frame index, and remove hours
myDates = dummy_frame.index.tolist()

# Is possible that your dates are in datetime obj or in string
# If string
myDates = [d.split()[0] for d in myDates]
# If date
myDates = [d.strftime("%Y-%m-%d") for d in myDates]

# Creates a list with missing data
missingDates = [d for d in completeDates if d not in myDates]

In this sense missingDates will be a list contaning all the missing dates or jumps from your data frame. Please let me know if this helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM