[英]Insert rows for missing time (format HH:MM:SS) in R
I am fairly new to R and was trying to determine if I could use R to help fill in missing values in a number of large data sets I am working with. 我对R相当陌生,并试图确定我是否可以使用R来帮助填充正在使用的许多大型数据集中的缺失值。 I'll try to explain it to the best of my abilities.
我将尽我所能解释它。
The data set I am working with has time data in the format HH:MM:SS. 我正在使用的数据集的时间数据格式为HH:MM:SS。 It is irregular in that no two data sets have the same time stamps, and the time stamp entries are recording an event over a 2 hour period.
这是不正常的,因为没有两个数据集具有相同的时间戳,并且时间戳条目记录的是2小时内的事件。 It looks something like this.
看起来像这样。
1. Date, Time_hms, Event
2. 9/22/2015, 00:00:00, 5
3. 9/22/2015, 00:00:24, 1
4. 9/22/2015, 00:00:24, 4
5. 9/22/2015, 00:01:42, 7
6. 9/22/2015, 00:02:04, 3
8. 9/22/2015, 00:02:35, 2
9. 9/22/2015, 00:03:02, 4
What I would like to do is add in missing rows at intervals of one minute, so that it looks like this. 我想做的是每隔一分钟添加一次缺失的行,这样看起来像这样。
1. Date, Time_hms, Event
2. 9/22/2015, 00:00:00, 5
3. 9/22/2015, 00:00:24, 1
4. 9/22/2015, 00:00:24, 4
5. 9/22/2015, 00:01:00, 4 # Summary row to be inserted
6. 9/22/2015, 00:01:42, 7
7. 9/22/2015, 00:02:00, 7 # Summary row to be inserted
8. 9/22/2015, 00:02:04, 3
9. 9/22/2015, 00:02:35, 2
10. 9/22/2015, 00:03:00, 2 # Summary row to be inserted
11. 9/22/2015, 00:03:02, 4
If possible, I would like the the rows to be filled in with the event that occurred during that range. 如果可能的话,我希望这些行填充该范围内发生的事件。
In trying to solve this, I found and tried this approach Insert rows for missing dates/times . 在尝试解决此问题时,我发现并尝试了这种方法插入缺少日期/时间的行 。 I tried using POSIXct but was unsuccessful because of the date format.
我尝试使用POSIXct,但是由于日期格式而失败。 I have also considered padr and fill_by_function, but am uncertain if that is the correct approach.
我也考虑过padr和fill_by_function,但是不确定这是否正确。 Is there a method to work strictly with HH:MM:SS format?
有没有一种方法可以严格使用HH:MM:SS格式?
Again, I am only just learning R and am unsure of how to approach this. 再说一次,我只是在学习R而不确定如何解决这个问题。 Any help or suggestions would be greatly appreciated!
任何帮助或建议,将不胜感激!
Edit: Hopefully I did this correctly. 编辑:希望我正确地做到了这一点。 Thank you again!
再次感谢你!
dput(elements)
structure(list(var1 = c("Date", "9/22/2015", "9/22/2015", "9/22/2015",
"9/22/2015", "9/22/2015", "9/22/2015", "9/22/2015"), var2 = c("Time_hms",
"00:00:00", "00:00:24", "00:00:24", "00:01:42", "00:02:04", "00:02:35",
"00:03:02"), var3 = c("Event", "5", "1", "4", "7", "3", "2",
"4")), .Names = c("var1", "var2", "var3"), row.names = c(NA,
8L), class = "data.frame")
Okay, your dput
data had the headers in the first column. 好的,您的
dput
数据在第一列中具有标题。 So we'll address that issue first: 因此,我们将首先解决该问题:
names(elements) = elements[1, ]
elements = elements[-1, ]
elements$Event = as.numeric(elements$Event)
Now we'll convert the dates and times to a POSIX datetime (in a separate vector), then we'll take the full range of the data and round it to the nearest minute. 现在,我们将日期和时间转换为POSIX日期时间(在单独的向量中),然后获取完整范围的数据并将其四舍五入到最接近的分钟。 We can then create a sequence of every minute from the first to the last (and omit the date so it's the same format):
然后,我们可以创建一个从第一分钟到最后一刻的每分钟的序列(并省略日期,因此其格式相同):
time_range = round(range(strptime(paste(elements$Date, elements$Time_hms), format = "%m/%d/%Y %H:%M:%S")), units = "mins")
each_minute = seq(from = time_range[1], to = time_range[2], by = "min")
each_minute = format(each_minute, "%H:%M:%S")
Finally, we merge
these results back into the original data, order the rows, and use zoo::na.locf
to fill in the missing values with the previous observation. 最后,我们
merge
这些结果merge
回原始数据中,对行进行排序,然后使用zoo::na.locf
填充先前观察到的缺失值。
result = merge(elements, data.frame(Time_hms = each_minute), all = T)
result = result[order(result$Time_hms), ]
result$Date = zoo::na.locf(result$Date)
result$Event = zoo::na.locf(result$Event)
result
# Time_hms Date Event
# 1 00:00:00 9/22/2015 5
# 2 00:00:24 9/22/2015 1
# 3 00:00:24 9/22/2015 4
# 4 00:01:00 9/22/2015 4
# 5 00:01:42 9/22/2015 7
# 6 00:02:00 9/22/2015 7
# 7 00:02:04 9/22/2015 3
# 8 00:02:35 9/22/2015 2
# 9 00:03:00 9/22/2015 2
# 10 00:03:02 9/22/2015 4
In general, and especially if your data might include different dates, you might find it easier to work with if you just add a new column to your data with the POSIX
datetime object. 通常,尤其是如果您的数据可能包含不同的日期时,如果仅使用
POSIX
datetime对象向数据中添加新列,则可能会发现使用起来更容易。 There's not a good class in R for dealing with times without dates (at least not base R) - but you have dates! R中没有很好的类来处理没有日期的时间(至少不是基于R的时间)-但是您有日期! And there are lots of functions that work well for dealing with dates, like the
seq
and round
I used in this answer. 并且有很多功能可以很好地处理日期,例如我在此答案中使用的
seq
和round
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.