[英]Apply functions to hourly data in R
我在HISTORY表中具有以下数据,列名称为:
ID, START_TIME, END_TIME, VALUE
51,2015-04-17 01:00:00,2015-04-17 01:10:00,98
51,2015-04-17 01:10:00,2015-04-17 01:20:00,96
51,2015-04-17 01:20:00,2015-04-17 01:30:00,97
51,2015-04-17 01:30:00,2015-04-17 01:40:00,99
51,2015-04-17 01:40:00,2015-04-17 01:50:00,98
51,2015-04-17 01:50:00,2015-04-17 02:00:00,105
51,2015-04-17 02:00:00,2015-04-17 02:10:00,103
51,2015-04-17 02:10:00,2015-04-17 02:20:00,101
51,2015-04-17 02:20:00,2015-04-17 02:30:00,100
51,2015-04-17 02:30:00,2015-04-17 02:40:00,104
51,2015-04-17 02:40:00,2015-04-17 02:50:00,102
51,2015-04-17 02:50:00,2015-04-17 03:00:00,98
51,2015-04-17 03:00:00,2015-04-17 03:10:00,97
51,2015-04-17 03:10:00,2015-04-17 03:20:00,96
51,2015-04-17 03:20:00,2015-04-17 03:30:00,99
51,2015-04-17 03:30:00,2015-04-17 03:40:00,100
51,2015-04-17 03:40:00,2015-04-17 03:50:00,101
51,2015-04-17 03:50:00,2015-04-17 04:00:00,102
51,2015-04-17 04:00:00,2015-04-17 04:10:00,99
51,2015-04-17 04:10:00,2015-04-17 04:20:00,104
51,2015-04-17 04:20:00,2015-04-17 04:30:00,105
51,2015-04-17 04:30:00,2015-04-17 04:40:00,103
51,2015-04-17 04:40:00,2015-04-17 04:50:00,98
51,2015-04-17 04:50:00,2015-04-17 05:00:00,97
51,2015-04-17 05:00:00,2015-04-17 05:10:00,101
51,2015-04-17 05:10:00,2015-04-17 05:20:00,103
51,2015-04-17 05:20:00,2015-04-17 05:30:00,101
51,2015-04-17 05:30:00,2015-04-17 05:40:00,105
51,2015-04-17 05:40:00,2015-04-17 05:50:00,102
51,2015-04-17 05:50:00,2015-04-17 06:00:00,98
我想将max()之类的函数应用于VALUE列,但要有一定的频率。 如果频率假设为1小时,则此功能将对5个不同的集合应用最大功能。
例如 从开始时间2015-04-17 01:00:00到2015-04-17 02:00:00等。 如何在R中实现这一目标。 最终输出如下所示:
51, 2015-04-17 02:00:00, 105
51, 2015-04-17 03:00:00, 102
51, 2015-04-17 04:00:00, 104
51, 2015-04-17 05:00:00, 105
51, 2015-04-17 06:00:00, 105
其中上面的列是ID,START_TIME直到可计算出max()的值,该值是该小时内max()函数的结果。 如何在r中实现这一点。 使用间隔还是其他?
谢谢..
这是使用data.table
的另一种方式
library(data.table)
setDT(df)[, .(MAX_VALUE = max(VALUE)),
by = .(ID, START_TIME = as.POSIXct(START_TIME, format = "%F %H") + 3600)]
# ID START_TIME MAX_VALUE
# 1: 51 2015-04-17 02:00:00 105
# 2: 51 2015-04-17 03:00:00 104
# 3: 51 2015-04-17 04:00:00 102
# 4: 51 2015-04-17 05:00:00 105
# 5: 51 2015-04-17 06:00:00 105
或没有任何软件包依赖性的类似解决方案
df$START_TIME2 <- as.POSIXct(df$START_TIME, format = "%F %H") + 3600
aggregate(VALUE ~ ID + START_TIME2, df, max)
# ID START_TIME2 VALUE
# 1 51 2015-04-17 02:00:00 105
# 2 51 2015-04-17 03:00:00 104
# 3 51 2015-04-17 04:00:00 102
# 4 51 2015-04-17 05:00:00 105
# 5 51 2015-04-17 06:00:00 105
你可以试试
library(dplyr)
HISTORY %>%
group_by(ID, TIME = format(START_TIME + 60*60, "%Y-%m-%d %H:00:00")) %>%
summarise(MAX_VALUE = max(VALUE))
# ID TIME MAX_VALUE
# 1 51 2015-04-17 02:00:00 105
# 2 51 2015-04-17 03:00:00 104
# 3 51 2015-04-17 04:00:00 102
# 4 51 2015-04-17 05:00:00 105
# 5 51 2015-04-17 06:00:00 105
这是使用data.table
的可能解决方案
library(data.table)
setDT(df)[, max(VALUE), by = .(START_TIME = sub(":.*", "", START_TIME))]
START_TIME V1
1: 2015-04-17 01 105
2: 2015-04-17 02 104
3: 2015-04-17 03 102
4: 2015-04-17 04 105
5: 2015-04-17 05 105
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.