简体   繁体   中英

time block coverage heat map data reshaping

I am trying to create a heat map using a very weird data structure

you can generate some sample data (admittedly very inefficient) with the following code:

times<-sort(format(seq.POSIXt(as.POSIXct(Sys.Date()),as.POSIXct(Sys.Date()+1),by = "5 min"),"%H%M"))
set.seed(922)
sample.data<-as.data.frame(matrix(NA,nrow = 2000,ncol = 10))
names(sample.data)<-c("INDEX","DAY1","START1","END1","DAY2","START2","END2","DAY3","START3","END3")
for(i in 1:nrow(sample.data)){
  sample.data[i,"INDEX"]<-sample(1:100,1,replace = T)
  sample.data[i,"DAY1"]<-sample(c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"),1,replace = F)
  sample.data[i,"START1"]<-sample(times,1,replace = T)
  sample.data[i,"END1"]<-sample(times,1,replace = T)
  sample.data[i,"DAY2"]<-sample(c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"),1,replace = F)
  sample.data[i,"START2"]<-sample(times,1,replace = T)
  sample.data[i,"END2"]<-sample(times,1,replace = T)
  sample.data[i,"DAY3"]<-sample(c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"),1,replace = F)
  sample.data[i,"START3"]<-sample(times,1,replace = T)
  sample.data[i,"END3"]<-sample(times,1,replace = T)
}

data<-sample.data%>%
  filter(START1<END1 & START2<END2 & START3<END3 & DAY1!=DAY2 & DAY1!=DAY3 & DAY2!=DAY3)

I know it's ugly and inefficient, but the data is roughly in this structure. You can think of this as the number of employees you have say at the airport at any given time where each row is the employees' shift times.

I want to create a heatmap with time of day broken into 5 minute segments on the y-axis, and Days of the Week on the x axis. Do I have to gather the columns and group by 5 minute time chunks? I have no clue.

If the data were in the right structure, I could group by weekday and the distinct 5 minute chunks, and tally every row where there was a observational unit at the airport. I just don't know how I'm going to get dplyr to say there's a person working without explicitly calling it out, and I don't know how to do that without a for loop. If I need to explain what I'm going for better, or if you have any bright ideas of how to get my data in the right form or if I'm even thinking about this in the right way, let me know. I've been banging my head against the desk, and I need to step away from the problem for a minute, but if it helps the heat map should come out if you execute the following plot code:

ggplot(data, aes(x = DAY, y = TIME_CHUNK))+
geom_tile(aes(fill = TOTAL_EMPLOYEES))+
geom_text(aes(label = TOTAL_EMPLOYEES), colour = "white",size = 3)

Thanks for your time...

Here's a partial solution that gets most of the way there. If I have time later I'll try to finish.

First, I'll reshape the data using a technique from here: https://stackoverflow.com/a/56605646/6851825

DAY <- grep("DAY", names(data))
START_END <- grep("START|END", names(data))
data_long <- cbind(stack(data, select = DAY), stack(data, select = START_END))
names(data_long) <- c("WEEKDAY", "DAYNUM", "TIME", "STATUS")

Here, I'll do some more reshaping to order the weekdays and convert TIME to a decimal, and to track the cumulative count in

library(tidyverse)
data_long_count <- data_long %>%
mutate(WEEKDAY = factor(WEEKDAY, levels = c("Sunday", "Monday", "Tuesday", 
                          "Wednesday", "Thursday", "Friday", "Saturday")),
       TIME_dec = as.numeric(TIME %>% str_sub(end = 2)) +
         as.numeric(TIME %>% str_sub(start = 3))/60,
       STATUS = STATUS %>% str_remove("[0-9]"),
       count_chg = if_else(STATUS == "START", 1, -1)) %>%
arrange(WEEKDAY, TIME_dec) %>%
mutate(employee_count = cumsum(count_chg)) 

[Missing step: fill in all the minutes with no change. Was going to use padr package for that, but it prefers to have a datetime or date . Or might use geom_rect to sidestep that.]

Without either of those, this heatmap is "spotty" b/c it only has stripes where the changes happen and not all the minutes between.

ggplot(data_long_count, aes(WEEKDAY, TIME_dec, fill = employee_count)) + geom_tile()

I think this should do it

clean_colnames <- function(col_inds) {
  data %>% select(INDEX, day = col_inds[1], start = col_inds[2], end = col_inds[3])
}

bind_rows(clean_colnames(2:4), clean_colnames(5:7), clean_colnames(8:10))  %>% 
  gather(key = start_end, value = time, -INDEX, -day) %>% 
  mutate(time = paste0("20190101 ", time) %>% lubridate::ymd_hm()) %>% 
  padr::pad(group = c("INDEX", "day")) %>% 
  count(day, time) %>% 
  mutate(time = paste0(substr(time, 12, 13), substr(time, 15, 16)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM