[英]R subset data by date and hour; for loop or sapply()?
假想的數據框代表一群羊,衣領上帶有rfid芯片。 整個現場都有數據采集器,其調制解調器連接到電線桿上。 每一次獨特的綿羊進入這些極點之一的范圍內時,都將其視為“事件”,該事件存儲在與極點上的調制解調器相連的arduino設備中。 每個arduino設備都有一個地址,大約每隔五分鍾,它會通過調制解調器進行呼叫以報告其狀態和事件數量。
> head(wow)
address checkin_time status_id number_events
1 11 2016-08-08 00:04:40 7 10
2 11 2016-08-08 00:09:53 7 13
3 11 2016-08-08 00:15:06 7 12
4 11 2016-08-08 00:20:20 7 11
5 11 2016-08-08 00:25:33 7 13
6 11 2016-08-08 00:30:45 7 5
我正在嘗試創建一個新的矩陣,其中包含所有唯一日期作為行,一天中每個唯一小時作為列,它們將該日期時間的事件總數相加。
這是我執行的代碼(被截斷):
allDays <- unique(as.Date(wow$checkin_time))
for (d in allDays) {
oneAM <- subset(wow, as.POSIXct(wow$checkin_time) >= as.POSIXct(paste(d,'00:00:00')) & as.POSIXct(wow$checkin_time) <= as.POSIXct(paste(d, '00:59:59')))
twoAM <- subset(wow, as.POSIXct(wow$checkin_time) >= as.POSIXct(paste(d,'01:00:00')) & as.POSIXct(wow$checkin_time) <= as.POSIXct(paste(d, '01:59:59')))
threeAM <- subset(wow, as.POSIXct(wow$checkin_time) >= as.POSIXct(paste(d,'02:00:00')) & as.POSIXct(wow$checkin_time) <= as.POSIXct(paste(d, '02:59:59')))
enter code here
. . .
elevenPM <- subset(wow, as.POSIXct(wow$checkin_time) >= as.POSIXct(paste(d,'22:00:00')) & as.POSIXct(wow$checkin_time) <= as.POSIXct(paste(d, '22:59:59')))
twelvePM <- subset(wow, as.POSIXct(wow$checkin_time) >= as.POSIXct(paste(d,'23:00:00')) & as.POSIXct(wow$checkin_time) <= as.POSIXct(paste(d, '23:59:59')))
dayAsHours <- c(sum(oneAM$number_events), sum(twoAM$number_events), sum(threeAM$number_events), sum(fourAM$number_events), sum(fiveAM$number_events), sum(sixAM$number_events),
sum(sevenAM$number_events), sum(eightAM$number_events), sum(nineAM$number_events), sum(tenAM$number_events), sum(elevenAM$number_events),
sum(twelveAM$number_events), sum(onePM$number_events), sum(twoPM$number_events), sum(threePM$number_events), sum(fourPM$number_events),
sum(fivePM$number_events), sum(sixPM$number_events), sum(sevenPM$number_events), sum(eightPM$number_events), sum(ninePM$number_events),
sum(tenPM$number_events), sum(elevenPM$number_events), sum(twelvePM$number_events))
dateMatrix <- rbind(dateMatrix, dayAsHours)
}
上面的代碼在硬編碼時僅對d
值起作用,但是當我將其包圍在for循環中時,它停止工作。
我得到的錯誤是:
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
另外,我知道我可能應該在這里使用sapply()
而不是for-loop
,但是我很難弄清楚如何構建該函數。 請問wow
是數據資產的功能將被應用到,或者這將是allDays
?
朝正確方向的任何觀點都將非常有幫助。
一種我認為想要的方法是使用format
從checkin_time
剝離日期和小時。 然后使用dplyr
:
library(dplyr)
library(tidyr)
result <- wow %>% mutate(Date=format(checkin_time, format="%Y-%m-%d"),
Hour=format(checkin_time, format="%H")) %>%
group_by(Date,Hour) %>%
summarise(number_events=sum(number_events)) %>%
spread(Hour, number_events)
筆記:
mutate
從checkin_time
刪除日期和小時中創建Date
和Hour
列。 group_by
Date
和Hour
以及使用summarise
來sum
了所有的number_events
每個Date
和Hour
。 spread
從tidyr
創建帶有表格式結果Date
作為行, Hours
為列。 我修改了您發布的輸入數據wow
,添加了更多的日期和時間:
wow <- structure(list(address = c(11L, 11L, 11L, 11L, 11L, 11L), checkin_time = structure(c(1470629080,
1470629393, 1470716106, 1470720020, 1470803133, 1470803445), class = c("POSIXct",
"POSIXt"), tzone = ""), status_id = c(7L, 7L, 7L, 7L, 7L, 7L),
number_events = c(10L, 13L, 12L, 11L, 13L, 5L)), .Names = c("address",
"checkin_time", "status_id", "number_events"), row.names = c(NA,
-6L), class = "data.frame")
## address checkin_time status_id number_events
##1 11 2016-08-08 00:04:40 7 10
##2 11 2016-08-08 00:09:53 7 13
##3 11 2016-08-09 00:15:06 7 12
##4 11 2016-08-09 01:20:20 7 11
##5 11 2016-08-10 00:25:33 7 13
##6 11 2016-08-10 00:30:45 7 5
使用此數據:
print(result)
##Source: local data frame [3 x 3]
##Groups: Date [3]
##
## Date 00 01
##* <chr> <int> <int>
##1 2016-08-08 23 NA
##2 2016-08-09 12 11
##3 2016-08-10 18 NA
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.