[英]Create a time interval of 15 minutes from minutely data in R?
I have some data which is formatted in the following way:我有一些按以下方式格式化的数据:
time count
00:00 17
00:01 62
00:02 41
So I have from 00:00 to 23:59hours and with a counter per minute.所以我从 00:00 到 23:59 小时,每分钟有一个计数器。 I'd like to group the data in intervals of 15 minutes such that:我想以 15 分钟为间隔对数据进行分组,以便:
time count
00:00-00:15 148
00:16-00:30 284
I have tried to do it manually but this is exhausting so I am sure there has to be a function or sth to do it easily but I haven't figured out yet how to do it.我曾尝试手动完成,但这很累,所以我确信必须有一个函数或某事才能轻松完成,但我还没有弄清楚如何去做。
I'd really appreciate some help!!我真的很感激一些帮助!
Thank you very much!非常感谢你!
For data that's in POSIXct format, you can use the cut
function to create 15-minute groupings, and then aggregate by those groups. 对于POSIXct格式的数据,您可以使用cut
函数创建15分钟分组,然后按这些分组进行汇总。 The code below shows how to do this in base R
and with the dplyr
and data.table
packages. 下面的代码显示了如何在base R
和dplyr
和data.table
包中执行此操作。
First, create some fake data: 首先,创建一些假数据:
set.seed(4984)
dat = data.frame(time=seq(as.POSIXct("2016-05-01"), as.POSIXct("2016-05-01") + 60*99, by=60),
count=sample(1:50, 100, replace=TRUE))
Base R 基地R.
cut
the data into 15 minute groups: cut
数据分成15分钟组:
dat$by15 = cut(dat$time, breaks="15 min")
time count by15 1 2016-05-01 00:00:00 22 2016-05-01 00:00:00 2 2016-05-01 00:01:00 11 2016-05-01 00:00:00 3 2016-05-01 00:02:00 31 2016-05-01 00:00:00 ... 98 2016-05-01 01:37:00 20 2016-05-01 01:30:00 99 2016-05-01 01:38:00 29 2016-05-01 01:30:00 100 2016-05-01 01:39:00 37 2016-05-01 01:30:00
Now aggregate
by the new grouping column, using sum
as the aggregation function: 现在aggregate
在新的分组列,使用sum
作为聚合函数:
dat.summary = aggregate(count ~ by15, FUN=sum, data=dat)
by15 count 1 2016-05-01 00:00:00 312 2 2016-05-01 00:15:00 395 3 2016-05-01 00:30:00 341 4 2016-05-01 00:45:00 318 5 2016-05-01 01:00:00 349 6 2016-05-01 01:15:00 397 7 2016-05-01 01:30:00 341
dplyr dplyr
library(dplyr)
dat.summary = dat %>% group_by(by15=cut(time, "15 min")) %>%
summarise(count=sum(count))
data.table data.table
library(data.table)
dat.summary = setDT(dat)[ , list(count=sum(count)), by=cut(time, "15 min")]
UPDATE: To answer the comment, for this case the end point of each grouping interval is as.POSIXct(as.character(dat$by15)) + 60*15 - 1
. 更新:要回答注释,对于这种情况,每个分组间隔的结束点为as.POSIXct(as.character(dat$by15)) + 60*15 - 1
。 In other words, the endpoint of the grouping interval is 15 minutes minus one second from the start of the interval. 换句话说,分组间隔的端点是从间隔开始的15分钟减去1秒。 We add 60*15 - 1 because POSIXct
is denominated in seconds. 我们添加60 * 15 - 1,因为POSIXct
以秒为单位计价。 The as.POSIXct(as.character(...))
is because cut
returns a factor and this just converts it back to date-time so that we can do math on it. as.POSIXct(as.character(...))
是因为cut
返回一个因子,这只是将它转换回日期时间,以便我们可以对它进行数学运算。
If you want the end point to the nearest minute before the next interval (instead of the nearest second), you could to as.POSIXct(as.character(dat$by15)) + 60*14
. 如果你希望终点到下一个间隔之前的最近分钟(而不是最近的间隔),你可以as.POSIXct(as.character(dat$by15)) + 60*14
。
If you don't know the break interval, for example, because you chose the number of breaks and let R pick the interval, you could find the number of seconds to add by doing max(unique(diff(as.POSIXct(as.character(dat$by15))))) - 1
. 如果您不知道中断间隔,例如,因为您选择了中断的数量并让R选择间隔,您可以通过执行max(unique(diff(as.POSIXct(as.character(dat$by15))))) - 1
来找到要添加的秒数max(unique(diff(as.POSIXct(as.character(dat$by15))))) - 1
。
You can do it in one line by using trs<\/strong> function from FQOAT, just like:您可以使用 FQOAT 中的trs<\/strong>函数在一行中完成,就像:
df_15mins=trs(df, "15 mins")
The cut approach is handy but slow with large data frames. 切割方法很方便,但数据帧较大。 The following approach is approximately 1,000x faster than the cut approach (tested with 400k records.) 以下方法比切割方法快约1,000倍(使用400k记录进行测试。)
# Function: Truncate (floor) POSIXct to time interval (specified in seconds)
# Author: Stephen McDaniel @ PowerTrip Analytics
# Date : 2017MAY
# Copyright: (C) 2017 by Freakalytics, LLC
# License: MIT
floor_datetime <- function(date_var, floor_seconds = 60,
origin = "1970-01-01") { # defaults to minute rounding
if(!is(date_var, "POSIXct")) stop("Please pass in a POSIXct variable")
if(is.na(date_var)) return(as.POSIXct(NA)) else {
return(as.POSIXct(floor(as.numeric(date_var) /
(floor_seconds))*(floor_seconds), origin = origin))
}
}
Sample output: 样本输出:
test <- data.frame(good = as.POSIXct(Sys.time()),
bad1 = as.Date(Sys.time()),
bad2 = as.POSIXct(NA))
test$good_15 <- floor_datetime(test$good, 15 * 60)
test$bad1_15 <- floor_datetime(test$bad1, 15 * 60)
Error in floor_datetime(test$bad, 15 * 60) :
Please pass in a POSIXct variable
test$bad2_15 <- floor_datetime(test$bad2, 15 * 60)
test
good bad1 bad2 good_15 bad2_15
1 2017-05-06 13:55:34.48 2017-05-06 <NA> 2007-05-06 13:45:00 <NA>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.