繁体   English   中英

汇总dplyr中的小时计数

[英]Aggregating hourly count in dplyr

我在R中有以下数据框

Date                      ID      
01-01-2017 12:39:00       CDF
01-01-2017 01:39:00       WED
01-01-2017 02:39:00       QWE
01-01-2017 05:39:00       TYU
01-01-2017 17:39:00       ERT
02-01-2017 02:30:34       DEF   

我想计算ID的每小时计数。 我想要的数据框是

Date           hours               Count
01-01-2017     00:00 - 01:00       1
01-01-2017     01:00 - 02:00       1
01-01-2017     02:00 - 03:00       1
01-01-2017     03:00 - 04:00       0
01-01-2017     04:00 - 05:00       0
01-01-2017     05:00 - 06:00       1
.
01-01-2017     23:00 - 00:00       0 
.
02-01-2017     12:00 - 01:00       0 
02-01-2017     01:00 - 02:00       0
02-01-2017     02:00 - 03:00       1

没有id的地方,我希望每小时的时段为零。 每个日期将包含24小时运动。

如何在R中实现这一目标?

这是使用lubridatebase R的一种方法。

在提供的数据集中,您的第一个观察值为01-01-2017 12:39:00 ,但在所需的输出中,计数为00:00 - 01:00 01-01-2017 12:39:00 00:00 - 01:00 在下面的代码中, 12:39:00 :39: 12:39:00将被视为12:39 PM,因此我假设您的意思是00:39:00 让我知道是否不是这种情况

library(lubridate)
# the data
txt <- "Date,ID      
01-01-2017 00:39:00,CDF
01-01-2017 01:39:00,WED
01-01-2017 02:39:00,QWE
01-01-2017 05:39:00,TYU
01-01-2017 17:39:00,ERT
02-01-2017 02:30:34,DEF"

df <- read.table(text = txt,sep = ",", header = TRUE)
# transforming the date strings into dates
dates <- as.POSIXct(strptime(df$Date, "%d-%m-%Y %H:%M:%S"))
# creating an hourly time sequence from start to end
total_time <- seq(from = floor_date(min(dates), "hour"), to = 
ceiling_date(max(dates), "hour"), by = "hour")

# in case there is more than one occurrence per interval  
count <-  sapply(total_time, function(x) {       
          sum(floor_date(dates,"hour") %in% x) })

data.frame(Date = strftime(total_time, format = "%d-%m-%Y"),
           hours = paste(strftime(total_time, format = "%H:%M"), 
                    strftime(total_time + 60*60, format="%H:%M"),         
                    sep = " - "),
           Count = count)

#          Date         hours Count
# 1  01-01-2017 00:00 - 01:00     1
# 2  01-01-2017 01:00 - 02:00     1
# 3  01-01-2017 02:00 - 03:00     1
# 4  01-01-2017 03:00 - 04:00     0
# 5  01-01-2017 04:00 - 05:00     0
# 6  01-01-2017 05:00 - 06:00     1
# 7  01-01-2017 06:00 - 07:00     0

tidyverse提供了一些有用的功能,例如count / tallycomplete

library(tidyverse)
library(lubridate)

dat <- read_csv('Date, ID      
  01-01-2017 12:39:00, CDF
  01-01-2017 01:39:00, WED
  01-01-2017 02:39:00, QWE
  01-01-2017 05:39:00, TYU
  01-01-2017 17:39:00, ERT
  02-01-2017 02:30:34, DEF'
) 

dat %>% 
   mutate(
       Date = dmy_hms(Date),
       day = floor_date(Date, 'day'), 
       hour = hour(Date)
   ) %>%
   group_by(day, hour) %>%
   tally %>%
   complete(day, hour = 0:23, fill = list('n' = 0))


## A tibble: 48 x 3
## Groups:   day [2]
#          day  hour     n
#       <dttm> <int> <dbl>
# 1 2017-01-01     0     0
# 2 2017-01-01     1     1
# 3 2017-01-01     2     1
# 4 2017-01-01     3     0
# 5 2017-01-01     4     0
# 6 2017-01-01     5     1
# 7 2017-01-01     6     0
# 8 2017-01-01     7     0
# 9 2017-01-01     8     0
#10 2017-01-01     9     0
## ... with 38 more rows

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM