简体   繁体   English

检查数据中是否有日期跳过

[英]Checking if there is date skip in data

I have a dataset and its indexes consist of timestamps.我有一个数据集,它的索引由时间戳组成。 It's pandas series just like below:它是 pandas 系列,如下所示:

Time                           
2013-09-17 22:08:11           0
2013-09-17 22:08:18           0
2013-09-17 22:08:26           0
2013-09-17 22:08:34           0
2013-09-17 22:08:42           0
2013-09-17 22:08:50           0
2013-09-17 22:08:58           0
2013-09-17 22:09:06           0
2013-09-17 22:09:11           0
2013-09-17 22:09:13           0
2013-09-17 22:09:19           0
2013-09-17 22:09:21           0
2013-09-17 22:09:27           0
2013-09-17 22:09:35           0
2013-09-17 22:09:43           0
Name: dummy_frame, dtype: float64

Data are recorded irregularly regarding to timestamps.关于时间戳的数据被不规则地记录。 Now what I want to do is to check this data, if there is date skip or jump inside it, such as from 2013-09-07 to 2013-12-22.现在我要做的是检查这个数据,如果里面有日期跳过或跳转,比如从2013-09-07到2013-12-22。 I can do it simply with check first and last date and compare them relatively.我可以简单地检查第一个和最后一个日期并进行比较。 However, I need to find where this jump occurs.但是,我需要找到这种跳转发生的位置。 Is there any easy way to find it out?有什么简单的方法可以找出来吗?

Thank you.谢谢你。

IIUC:国际大学联盟:

x = #your series
x.index = pd.to_datetime(x.index)
jumps = x.index.dt.date - x.index.shift(1).dt.date

This will create a series where jump[i] is the difference between jump[i] and jump[i-1] if you want to find where jump>1, just do:这将创建一个系列,其中 jump[i] 是 jump[i] 和 jump[i-1] 之间的差异,如果您想找到 jump>1 的位置,只需执行以下操作:

x[jump>1]

I believe you could simply create a data range with the same date format and compare both lists:我相信您可以简单地创建一个具有相同日期格式的数据范围并比较两个列表:

from datetime import datetime,timedelta

start_date = datetime.strptime("2013-09-07","%Y-%m-%d")
end_date = datetime.strptime("2013-12-22","%Y-%m-%d")

# This will create a list with complete dates
completeDates = [start_date + timedelta(days=x) for x in range(0,(end_dat-start_date ).days + 1)]
completeDates = [d.strftime("%Y-%m-%d") for d in completeDates] # Convert date to string

# Get your list from data frame index, and remove hours
myDates = dummy_frame.index.tolist()

# Is possible that your dates are in datetime obj or in string
# If string
myDates = [d.split()[0] for d in myDates]
# If date
myDates = [d.strftime("%Y-%m-%d") for d in myDates]

# Creates a list with missing data
missingDates = [d for d in completeDates if d not in myDates]

In this sense missingDates will be a list contaning all the missing dates or jumps from your data frame.从这个意义上说missingDates将是一个包含所有丢失日期或从数据框中跳转的列表。 Please let me know if this helps!请让我知道这可不可以帮你!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM