简体   繁体   中英

Create time of day column categories based on hour of time interval

I am trying to create a "time of day" column where I categorize time of day into sections based on the hour. For example, any time from 20:00-21:59 would be in the "20-22" category. I plan to do this many times, for various intervals (eg two-hour intervals, 3-hour intervals, and so on).

Here's an example of my data:

library(lubridate)
library(chron)

table <- "ID        date time
1 1 2016-04-30 21:00:00
2 2 2016-04-30 23:15:00
3 3 2016-04-30 19:30:00
4 4 2016-04-30 17:45:00
5 5 2016-04-30 14:00:00
6 6 2016-04-30 13:15:00
7 7 2016-04-30 05:30:00
8 8 2016-04-30 07:45:00
9 9 2016-04-30 09:00:00
10 10 2016-04-30 13:15:00
11 11 2016-04-30 10:30:00
12 12 2016-04-30 11:45:00
13 13 2016-05-01 12:00:00
14 14 2016-05-01 00:15:00
15 15 2016-05-01 01:30:00
16 16 2016-05-01 03:45:00
17 17 2016-05-01 04:00:00
18 18 2016-05-01 06:15:00
19 19 2016-05-01 19:30:00
20 20 2016-05-01 20:00:00"

# Create dataframe
df <- read.table(text=table, header = TRUE)

# Change time format
df$time <- times(df$time) 

# Add hour
df$hour <- hour(hms(df$time))
str(df)

I have tried various resources from this site, but I always have some issue with the resulting data. Here is a breakdown of what I have tried:

  1. The below code does not work because any time that falls on the hour (eg 20:00:00 in this case) goes into the category before it (18-20) instead of the one it should be in (20-22). This code also does not work for the 3-hour interval.
breaks <- c(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24) / 24
labels <- c("00-02", "02-04", "04-06", "06-08", "08-10", "10-12", "12-14", "14-16",
             "16-18", "18-20", "20-22", "22-00")
df$tod <- cut(df$time, breaks, labels, include.lowest = TRUE)

  1. This code does not work because it results some NA values for times in one category (eg 23:15:00).
breaks2 <- hour(hm("02:00", "04:00", "06:00", "08:00", "10:00", "12:00", "14:00", "16:00",
                   "18:00", "20:00", "22:00", "00:00", "01:59"))
labels2 <- c("22-00", "00-02", "02-04", "04-06", "06-08", "08-10", "10-12", "12-14", "14-16",
             "16-18", "18-20", "20-22")
df$tod2 <- cut(x=df$hour, breaks=breaks2, labels=labels2, include.lowest=TRUE)

Any help would be appreciated!

Your first attempt fails because you are looking for the right = FALSE setting, not include.lowest which only affects the lowest bin. You could possibly run into floating point precision issues when dividing by 24, so I think it's simplest to cut the hour column directly:

df$hour <- hour(hms(df$time))
hr_breaks = seq(0, 24, by = 2)
hr_labels <- c("00-02", "02-04", "04-06", "06-08", "08-10", "10-12", "12-14", "14-16",
             "16-18", "18-20", "20-22", "22-00")
df$tod = cut(df$hour, breaks = hr_breaks, 
             labels = hr_labels,
             include.lowest = T, right = F)
df
#    ID       date     time hour   tod
# 1   1 2016-04-30 21:00:00   21 20-22
# 2   2 2016-04-30 23:15:00   23 22-00
# 3   3 2016-04-30 19:30:00   19 18-20
# 4   4 2016-04-30 17:45:00   17 16-18
# 5   5 2016-04-30 14:00:00   14 12-14
# 6   6 2016-04-30 13:15:00   13 12-14
# 7   7 2016-04-30 05:30:00    5 04-06
# 8   8 2016-04-30 07:45:00    7 06-08
# 9   9 2016-04-30 09:00:00    9 08-10
# 10 10 2016-04-30 13:15:00   13 12-14
# 11 11 2016-04-30 10:30:00   10 08-10
# 12 12 2016-04-30 11:45:00   11 10-12
# 13 13 2016-05-01 12:00:00   12 10-12
# 14 14 2016-05-01 00:15:00    0 00-02
# 15 15 2016-05-01 01:30:00    1 00-02
# 16 16 2016-05-01 03:45:00    3 02-04
# 17 17 2016-05-01 04:00:00    4 02-04
# 18 18 2016-05-01 06:15:00    6 04-06
# 19 19 2016-05-01 19:30:00   19 18-20
# 20 20 2016-05-01 20:00:00   20 18-20

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM