简体   繁体   English

"从 R 中的每分钟数据创建 15 分钟的时间间隔?"

[英]Create a time interval of 15 minutes from minutely data in R?

I have some data which is formatted in the following way:我有一些按以下方式格式化的数据:

time     count 
00:00    17
00:01    62
00:02    41

So I have from 00:00 to 23:59hours and with a counter per minute.所以我从 00:00 到 23:59 小时,每分钟有一个计数器。 I'd like to group the data in intervals of 15 minutes such that:我想以 15 分钟为间隔对数据进行分组,以便:

time           count
00:00-00:15    148   
00:16-00:30    284

I have tried to do it manually but this is exhausting so I am sure there has to be a function or sth to do it easily but I haven't figured out yet how to do it.我曾尝试手动完成,但这很累,所以我确信必须有一个函数或某事才能轻松完成,但我还没有弄清楚如何去做。

I'd really appreciate some help!!我真的很感激一些帮助!

Thank you very much!非常感谢你!

For data that's in POSIXct format, you can use the cut function to create 15-minute groupings, and then aggregate by those groups. 对于POSIXct格式的数据,您可以使用cut函数创建15分钟分组,然后按这些分组进行汇总。 The code below shows how to do this in base R and with the dplyr and data.table packages. 下面的代码显示了如何在base Rdplyrdata.table包中执行此操作。

First, create some fake data: 首先,创建一些假数据:

set.seed(4984)
dat = data.frame(time=seq(as.POSIXct("2016-05-01"), as.POSIXct("2016-05-01") + 60*99, by=60),
                 count=sample(1:50, 100, replace=TRUE))

Base R 基地R.

cut the data into 15 minute groups: cut数据分成15分钟组:

dat$by15 = cut(dat$time, breaks="15 min")
  time count by15 1 2016-05-01 00:00:00 22 2016-05-01 00:00:00 2 2016-05-01 00:01:00 11 2016-05-01 00:00:00 3 2016-05-01 00:02:00 31 2016-05-01 00:00:00 ... 98 2016-05-01 01:37:00 20 2016-05-01 01:30:00 99 2016-05-01 01:38:00 29 2016-05-01 01:30:00 100 2016-05-01 01:39:00 37 2016-05-01 01:30:00 

Now aggregate by the new grouping column, using sum as the aggregation function: 现在aggregate在新的分组列,使用sum作为聚合函数:

dat.summary = aggregate(count ~ by15, FUN=sum, data=dat)
  by15 count 1 2016-05-01 00:00:00 312 2 2016-05-01 00:15:00 395 3 2016-05-01 00:30:00 341 4 2016-05-01 00:45:00 318 5 2016-05-01 01:00:00 349 6 2016-05-01 01:15:00 397 7 2016-05-01 01:30:00 341 

dplyr dplyr

library(dplyr)

dat.summary = dat %>% group_by(by15=cut(time, "15 min")) %>%
  summarise(count=sum(count))

data.table data.table

library(data.table)

dat.summary = setDT(dat)[ , list(count=sum(count)), by=cut(time, "15 min")]

UPDATE: To answer the comment, for this case the end point of each grouping interval is as.POSIXct(as.character(dat$by15)) + 60*15 - 1 . 更新:要回答注释,对于这种情况,每个分组间隔的结束点为as.POSIXct(as.character(dat$by15)) + 60*15 - 1 In other words, the endpoint of the grouping interval is 15 minutes minus one second from the start of the interval. 换句话说,分组间隔的端点是从间隔开始的15分钟减去1秒。 We add 60*15 - 1 because POSIXct is denominated in seconds. 我们添加60 * 15 - 1,因为POSIXct以秒为单位计价。 The as.POSIXct(as.character(...)) is because cut returns a factor and this just converts it back to date-time so that we can do math on it. as.POSIXct(as.character(...))是因为cut返回一个因子,这只是将它转换回日期时间,以便我们可以对它进行数学运算。

If you want the end point to the nearest minute before the next interval (instead of the nearest second), you could to as.POSIXct(as.character(dat$by15)) + 60*14 . 如果你希望终点到下一个间隔之前的最近分钟(而不是最近的间隔),你可以as.POSIXct(as.character(dat$by15)) + 60*14

If you don't know the break interval, for example, because you chose the number of breaks and let R pick the interval, you could find the number of seconds to add by doing max(unique(diff(as.POSIXct(as.character(dat$by15))))) - 1 . 如果您不知道中断间隔,例如,因为您选择了中断的数量并让R选择间隔,您可以通过执行max(unique(diff(as.POSIXct(as.character(dat$by15))))) - 1来找到要添加的秒数max(unique(diff(as.POSIXct(as.character(dat$by15))))) - 1

You can do it in one line by using trs<\/strong> function from FQOAT, just like:您可以使用 FQOAT 中的trs<\/strong>函数在一行中完成,就像:

df_15mins=trs(df, "15 mins")

The cut approach is handy but slow with large data frames. 切割方法很方便,但数据帧较大。 The following approach is approximately 1,000x faster than the cut approach (tested with 400k records.) 以下方法比切割方法快约1,000倍(使用400k记录进行测试。)

  #     Function: Truncate (floor) POSIXct to time interval (specified in seconds)
  #       Author: Stephen McDaniel @ PowerTrip Analytics
  #        Date : 2017MAY
  #    Copyright: (C) 2017 by Freakalytics, LLC
  #      License: MIT

  floor_datetime <- function(date_var, floor_seconds = 60, 
        origin = "1970-01-01") { # defaults to minute rounding
     if(!is(date_var, "POSIXct")) stop("Please pass in a POSIXct variable")
     if(is.na(date_var)) return(as.POSIXct(NA)) else {
        return(as.POSIXct(floor(as.numeric(date_var) / 
           (floor_seconds))*(floor_seconds), origin = origin))
     }
  }

Sample output: 样本输出:

test <- data.frame(good = as.POSIXct(Sys.time()), 
   bad1 = as.Date(Sys.time()),
   bad2 = as.POSIXct(NA))

test$good_15 <- floor_datetime(test$good, 15 * 60)
test$bad1_15 <- floor_datetime(test$bad1, 15 * 60)
Error in floor_datetime(test$bad, 15 * 60) : 
  Please pass in a POSIXct variable
test$bad2_15 <- floor_datetime(test$bad2, 15 * 60)

test

                        good       bad1 bad2             good_15 bad2_15
    1 2017-05-06 13:55:34.48 2017-05-06 <NA> 2007-05-06 13:45:00    <NA>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM