簡體   English   中英

按日期和小時的R子集數據; for循環還是sapply()?

[英]R subset data by date and hour; for loop or sapply()?

假想的數據框代表一群羊,衣領上帶有rfid芯片。 整個現場都有數據采集器,其調制解調器連接到電線桿上。 每一次獨特的綿羊進入這些極點之一的范圍內時,都將其視為“事件”,該事件存儲在與極點上的調制解調器相連的arduino設備中。 每個arduino設備都有一個地址,大約每隔五分鍾,它會通過調制解調器進行呼叫以報告其狀態和事件數量。

> head(wow)
  address        checkin_time status_id number_events
1      11 2016-08-08 00:04:40         7            10
2      11 2016-08-08 00:09:53         7            13
3      11 2016-08-08 00:15:06         7            12
4      11 2016-08-08 00:20:20         7            11
5      11 2016-08-08 00:25:33         7            13
6      11 2016-08-08 00:30:45         7             5

我正在嘗試創建一個新的矩陣,其中包含所有唯一日期作為行,一天中每個唯一小時作為列,它們將該日期時間的事件總數相加。

這是我執行的代碼(被截斷):

allDays <- unique(as.Date(wow$checkin_time))
for (d in allDays) {
oneAM <- subset(wow, as.POSIXct(wow$checkin_time) >= as.POSIXct(paste(d,'00:00:00')) & as.POSIXct(wow$checkin_time) <= as.POSIXct(paste(d, '00:59:59')))
twoAM <- subset(wow, as.POSIXct(wow$checkin_time) >= as.POSIXct(paste(d,'01:00:00')) & as.POSIXct(wow$checkin_time) <= as.POSIXct(paste(d, '01:59:59')))
threeAM <- subset(wow, as.POSIXct(wow$checkin_time) >= as.POSIXct(paste(d,'02:00:00')) & as.POSIXct(wow$checkin_time) <= as.POSIXct(paste(d, '02:59:59')))
enter code here
. . .

elevenPM <- subset(wow, as.POSIXct(wow$checkin_time) >= as.POSIXct(paste(d,'22:00:00')) & as.POSIXct(wow$checkin_time) <= as.POSIXct(paste(d, '22:59:59')))
twelvePM <- subset(wow, as.POSIXct(wow$checkin_time) >= as.POSIXct(paste(d,'23:00:00')) & as.POSIXct(wow$checkin_time) <= as.POSIXct(paste(d, '23:59:59')))
dayAsHours <- c(sum(oneAM$number_events), sum(twoAM$number_events), sum(threeAM$number_events), sum(fourAM$number_events), sum(fiveAM$number_events), sum(sixAM$number_events), 
                sum(sevenAM$number_events), sum(eightAM$number_events), sum(nineAM$number_events), sum(tenAM$number_events), sum(elevenAM$number_events), 
                sum(twelveAM$number_events), sum(onePM$number_events), sum(twoPM$number_events), sum(threePM$number_events), sum(fourPM$number_events), 
                sum(fivePM$number_events), sum(sixPM$number_events), sum(sevenPM$number_events), sum(eightPM$number_events), sum(ninePM$number_events), 
                sum(tenPM$number_events), sum(elevenPM$number_events), sum(twelvePM$number_events))
dateMatrix <- rbind(dateMatrix, dayAsHours)
}

上面的代碼在硬編碼時僅對d值起作用,但是當我將其包圍在for循環中時,它停止工作。

我得到的錯誤是:

Error in as.POSIXlt.character(x, tz, ...) : 
character string is not in a standard unambiguous format

另外,我知道我可能應該在這里使用sapply()而不是for-loop ,但是我很難弄清楚如何構建該函數。 請問wow是數據資產的功能將被應用到,或者這將是allDays

朝正確方向的任何觀點都將非常有幫助。

一種我認為想要的方法是使用formatcheckin_time剝離日期和小時。 然后使用dplyr

library(dplyr)
library(tidyr)
result <- wow %>% mutate(Date=format(checkin_time, format="%Y-%m-%d"),
                         Hour=format(checkin_time, format="%H")) %>%
                  group_by(Date,Hour) %>% 
                  summarise(number_events=sum(number_events)) %>%
                  spread(Hour, number_events)

筆記:

  1. 使用mutatecheckin_time刪除日期和小時中創建DateHour列。
  2. group_by DateHour以及使用summarisesum了所有的number_events每個DateHour
  3. 使用spreadtidyr創建帶有表格式結果Date作為行, Hours為列。

我修改了您發布的輸入數據wow ,添加了更多的日期和時間:

wow <- structure(list(address = c(11L, 11L, 11L, 11L, 11L, 11L), checkin_time = structure(c(1470629080, 
1470629393, 1470716106, 1470720020, 1470803133, 1470803445), class = c("POSIXct", 
"POSIXt"), tzone = ""), status_id = c(7L, 7L, 7L, 7L, 7L, 7L), 
    number_events = c(10L, 13L, 12L, 11L, 13L, 5L)), .Names = c("address", 
"checkin_time", "status_id", "number_events"), row.names = c(NA, 
-6L), class = "data.frame")
##  address        checkin_time status_id number_events
##1      11 2016-08-08 00:04:40         7            10
##2      11 2016-08-08 00:09:53         7            13
##3      11 2016-08-09 00:15:06         7            12
##4      11 2016-08-09 01:20:20         7            11
##5      11 2016-08-10 00:25:33         7            13
##6      11 2016-08-10 00:30:45         7             5

使用此數據:

print(result)
##Source: local data frame [3 x 3]
##Groups: Date [3]
##
##        Date    00    01
##*      <chr> <int> <int>
##1 2016-08-08    23    NA
##2 2016-08-09    12    11
##3 2016-08-10    18    NA

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM