简体   繁体   中英

Apply functions to hourly data in R

I have the following data in HISTORY table with column names as:

ID, START_TIME, END_TIME, VALUE

51,2015-04-17 01:00:00,2015-04-17 01:10:00,98
51,2015-04-17 01:10:00,2015-04-17 01:20:00,96
51,2015-04-17 01:20:00,2015-04-17 01:30:00,97
51,2015-04-17 01:30:00,2015-04-17 01:40:00,99
51,2015-04-17 01:40:00,2015-04-17 01:50:00,98
51,2015-04-17 01:50:00,2015-04-17 02:00:00,105
51,2015-04-17 02:00:00,2015-04-17 02:10:00,103
51,2015-04-17 02:10:00,2015-04-17 02:20:00,101
51,2015-04-17 02:20:00,2015-04-17 02:30:00,100
51,2015-04-17 02:30:00,2015-04-17 02:40:00,104
51,2015-04-17 02:40:00,2015-04-17 02:50:00,102
51,2015-04-17 02:50:00,2015-04-17 03:00:00,98
51,2015-04-17 03:00:00,2015-04-17 03:10:00,97
51,2015-04-17 03:10:00,2015-04-17 03:20:00,96
51,2015-04-17 03:20:00,2015-04-17 03:30:00,99
51,2015-04-17 03:30:00,2015-04-17 03:40:00,100
51,2015-04-17 03:40:00,2015-04-17 03:50:00,101
51,2015-04-17 03:50:00,2015-04-17 04:00:00,102
51,2015-04-17 04:00:00,2015-04-17 04:10:00,99
51,2015-04-17 04:10:00,2015-04-17 04:20:00,104
51,2015-04-17 04:20:00,2015-04-17 04:30:00,105
51,2015-04-17 04:30:00,2015-04-17 04:40:00,103
51,2015-04-17 04:40:00,2015-04-17 04:50:00,98
51,2015-04-17 04:50:00,2015-04-17 05:00:00,97
51,2015-04-17 05:00:00,2015-04-17 05:10:00,101
51,2015-04-17 05:10:00,2015-04-17 05:20:00,103
51,2015-04-17 05:20:00,2015-04-17 05:30:00,101
51,2015-04-17 05:30:00,2015-04-17 05:40:00,105
51,2015-04-17 05:40:00,2015-04-17 05:50:00,102
51,2015-04-17 05:50:00,2015-04-17 06:00:00,98

I want to apply the function like max() to the VALUE column but with some frequency. If frequency is suppose 1hour then max function will be applied for 5 different sets for this data.

Ex. From start time 2015-04-17 01:00:00 to 2015-04-17 02:00:00 and so on. How to achieve this in r. Final output would be like this:

51, 2015-04-17 02:00:00, 105
51, 2015-04-17 03:00:00, 102
51, 2015-04-17 04:00:00, 104
51, 2015-04-17 05:00:00, 105
51, 2015-04-17 06:00:00, 105

where above columns are ID, START_TIME upto which max() is calculated, value which is the result of max() function for that hour. How to make this possible in r. Using intervals or something else?

Thanks..

Here's another way using data.table

library(data.table)
setDT(df)[, .(MAX_VALUE = max(VALUE)), 
             by = .(ID, START_TIME = as.POSIXct(START_TIME, format = "%F %H") + 3600)]
#    ID          START_TIME MAX_VALUE
# 1: 51 2015-04-17 02:00:00       105
# 2: 51 2015-04-17 03:00:00       104
# 3: 51 2015-04-17 04:00:00       102
# 4: 51 2015-04-17 05:00:00       105
# 5: 51 2015-04-17 06:00:00       105

Or a similar solution without any packages dependencies

df$START_TIME2 <-  as.POSIXct(df$START_TIME, format = "%F %H") + 3600
aggregate(VALUE ~ ID + START_TIME2, df, max)
#   ID         START_TIME2 VALUE
# 1 51 2015-04-17 02:00:00   105
# 2 51 2015-04-17 03:00:00   104
# 3 51 2015-04-17 04:00:00   102
# 4 51 2015-04-17 05:00:00   105
# 5 51 2015-04-17 06:00:00   105

You could try

library(dplyr)
HISTORY %>% 
  group_by(ID, TIME = format(START_TIME + 60*60, "%Y-%m-%d %H:00:00")) %>% 
  summarise(MAX_VALUE = max(VALUE))
#   ID                TIME MAX_VALUE
# 1 51 2015-04-17 02:00:00       105
# 2 51 2015-04-17 03:00:00       104
# 3 51 2015-04-17 04:00:00       102
# 4 51 2015-04-17 05:00:00       105
# 5 51 2015-04-17 06:00:00       105

Here's a possible solution using data.table

library(data.table)
setDT(df)[, max(VALUE), by = .(START_TIME = sub(":.*", "", START_TIME))]
     START_TIME  V1
1: 2015-04-17 01 105
2: 2015-04-17 02 104
3: 2015-04-17 03 102
4: 2015-04-17 04 105
5: 2015-04-17 05 105

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM