I have a data.table, allData
, containing data on roughly every (POSIXct) second from different nights. Some nights however are on the same date since data is collected from different people, so I have a column nightNo as an id for every different night.
timestamp nightNo data1 data2
2018-10-19 19:15:00 1 1 7
2018-10-19 19:15:01 1 2 8
2018-10-19 19:15:02 1 3 9
2018-10-19 18:10:22 2 4 10
2018-10-19 18:10:23 2 5 11
2018-10-19 18:10:24 2 6 12
I'd like to aggregate the data to minutes (per night) and using this question I've come up with the following code:
aggregate_minute <- function(df){
df %>%
group_by(timestamp = cut(timestamp, breaks= "1 min")) %>%
summarise(data1= mean(data1), data2= mean(data2)) %>%
as.data.table()
}
allData <- allData[, aggregate_minute(allData), by=nightNo]
However my data.table is quite large and this code isn't fast enough. Is there a more efficient way to solve this problem?
allData <- data.table(timestamp = c(rep(Sys.time(), 3), rep(Sys.time() + 320, 3)),
nightNo = rep(1:2, c(3, 3)),
data1 = 1:6,
data2 = 7:12)
timestamp nightNo data1 data2
1: 2018-06-14 10:43:11 1 1 7
2: 2018-06-14 10:43:11 1 2 8
3: 2018-06-14 10:43:11 1 3 9
4: 2018-06-14 10:48:31 2 4 10
5: 2018-06-14 10:48:31 2 5 11
6: 2018-06-14 10:48:31 2 6 12
allData[, .(data1 = mean(data1), data2 = mean(data2)), by = .(nightNo, timestamp = cut(timestamp, breaks= "1 min"))]
nightNo timestamp data1 data2
1: 1 2018-06-14 10:43:00 2 8
2: 2 2018-06-14 10:48:00 5 11
> system.time(replicate(500, allData[, aggregate_minute(allData), by=nightNo]))
user system elapsed
3.25 0.02 3.31
> system.time(replicate(500, allData[, .(data1 = mean(data1), data2 = mean(data2)), by = .(nightNo, timestamp = cut(timestamp, breaks= "1 min"))]))
user system elapsed
1.02 0.04 1.06
You can use lubridate
to 'round' the dates and then use data.table
to aggregate the columns.
library(data.table)
library(lubridate)
Reproducible data:
text <- "timestamp nightNo data1 data2
'2018-10-19 19:15:00' 1 1 7
'2018-10-19 19:15:01' 1 2 8
'2018-10-19 19:15:02' 1 3 9
'2018-10-19 18:10:22' 2 4 10
'2018-10-19 18:10:23' 2 5 11
'2018-10-19 18:10:24' 2 6 12"
allData <- read.table(text = text, header = TRUE, stringsAsFactors = FALSE)
Create data.table
:
setDT(allData)
Create a timestamp and floor it to the nearest minute:
allData[, timestamp := floor_date(ymd_hms(timestamp), "minutes")]
Change the type of the integer columns to numeric
:
allData[, ':='(data1 = as.numeric(data1),
data2 = as.numeric(data2))]
Replace the data columns with their means by nightNo
group:
allData[, ':='(data1 = mean(data1),
data2 = mean(data2)),
by = nightNo]
The result is:
timestamp nightNo data1 data2
1: 2018-10-19 19:15:00 1 2 8
2: 2018-10-19 19:15:00 1 2 8
3: 2018-10-19 19:15:00 1 2 8
4: 2018-10-19 18:10:00 2 5 11
5: 2018-10-19 18:10:00 2 5 11
6: 2018-10-19 18:10:00 2 5 11
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.