简体   繁体   English

时间块覆盖热图数据重塑

[英]time block coverage heat map data reshaping

I am trying to create a heat map using a very weird data structure 我正在尝试使用非常奇怪的数据结构创建热图

you can generate some sample data (admittedly very inefficient) with the following code: 您可以使用以下代码生成一些示例数据(通常效率非常低):

times<-sort(format(seq.POSIXt(as.POSIXct(Sys.Date()),as.POSIXct(Sys.Date()+1),by = "5 min"),"%H%M"))
set.seed(922)
sample.data<-as.data.frame(matrix(NA,nrow = 2000,ncol = 10))
names(sample.data)<-c("INDEX","DAY1","START1","END1","DAY2","START2","END2","DAY3","START3","END3")
for(i in 1:nrow(sample.data)){
  sample.data[i,"INDEX"]<-sample(1:100,1,replace = T)
  sample.data[i,"DAY1"]<-sample(c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"),1,replace = F)
  sample.data[i,"START1"]<-sample(times,1,replace = T)
  sample.data[i,"END1"]<-sample(times,1,replace = T)
  sample.data[i,"DAY2"]<-sample(c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"),1,replace = F)
  sample.data[i,"START2"]<-sample(times,1,replace = T)
  sample.data[i,"END2"]<-sample(times,1,replace = T)
  sample.data[i,"DAY3"]<-sample(c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"),1,replace = F)
  sample.data[i,"START3"]<-sample(times,1,replace = T)
  sample.data[i,"END3"]<-sample(times,1,replace = T)
}

data<-sample.data%>%
  filter(START1<END1 & START2<END2 & START3<END3 & DAY1!=DAY2 & DAY1!=DAY3 & DAY2!=DAY3)

I know it's ugly and inefficient, but the data is roughly in this structure. 我知道这是丑陋和低效的,但数据大致在这个结构中。 You can think of this as the number of employees you have say at the airport at any given time where each row is the employees' shift times. 您可以将此视为在任何给定时间您在机场所说的员工人数,其中每一行是员工的轮班时间。

I want to create a heatmap with time of day broken into 5 minute segments on the y-axis, and Days of the Week on the x axis. 我想创建一个热图,每天的时间分为y轴上的5分钟段和x轴上的星期几。 Do I have to gather the columns and group by 5 minute time chunks? 我是否必须按5分钟的时间段收集列和组? I have no clue. 我没有任何线索。

If the data were in the right structure, I could group by weekday and the distinct 5 minute chunks, and tally every row where there was a observational unit at the airport. 如果数据是正确的结构,我可以按工作日和不同的5分钟块进行分组,并计算机场有一个观察单位的每一行。 I just don't know how I'm going to get dplyr to say there's a person working without explicitly calling it out, and I don't know how to do that without a for loop. 我只是不知道我怎么会得到dplyr说有一个人在没有明确地调用它的情况下工作,而且我不知道如何在没有for循环的情况下这样做。 If I need to explain what I'm going for better, or if you have any bright ideas of how to get my data in the right form or if I'm even thinking about this in the right way, let me know. 如果我需要解释一下我的目标是什么,或者如果你对如何以正确的方式获取我的数据有任何明智的想法,或者我甚至以正确的方式考虑这个问题,请告诉我。 I've been banging my head against the desk, and I need to step away from the problem for a minute, but if it helps the heat map should come out if you execute the following plot code: 我一直在把桌子撞到桌子上,我需要暂时离开问题一分钟,但是如果你执行下面的绘图代码就会出现热量图:

ggplot(data, aes(x = DAY, y = TIME_CHUNK))+
geom_tile(aes(fill = TOTAL_EMPLOYEES))+
geom_text(aes(label = TOTAL_EMPLOYEES), colour = "white",size = 3)

Thanks for your time... 谢谢你的时间...

Here's a partial solution that gets most of the way there. 这是一个部分解决方案,大部分都在那里。 If I have time later I'll try to finish. 如果我有时间,我会尽力完成。

First, I'll reshape the data using a technique from here: https://stackoverflow.com/a/56605646/6851825 首先,我将使用以下技术重塑数据: https//stackoverflow.com/a/56605646/6851825

DAY <- grep("DAY", names(data))
START_END <- grep("START|END", names(data))
data_long <- cbind(stack(data, select = DAY), stack(data, select = START_END))
names(data_long) <- c("WEEKDAY", "DAYNUM", "TIME", "STATUS")

Here, I'll do some more reshaping to order the weekdays and convert TIME to a decimal, and to track the cumulative count in 在这里,我将进行更多重塑以订购工作日并将TIME转换为小数,并跟踪累积计数

library(tidyverse)
data_long_count <- data_long %>%
mutate(WEEKDAY = factor(WEEKDAY, levels = c("Sunday", "Monday", "Tuesday", 
                          "Wednesday", "Thursday", "Friday", "Saturday")),
       TIME_dec = as.numeric(TIME %>% str_sub(end = 2)) +
         as.numeric(TIME %>% str_sub(start = 3))/60,
       STATUS = STATUS %>% str_remove("[0-9]"),
       count_chg = if_else(STATUS == "START", 1, -1)) %>%
arrange(WEEKDAY, TIME_dec) %>%
mutate(employee_count = cumsum(count_chg)) 

[Missing step: fill in all the minutes with no change. [缺失步骤:填写所有分钟,无变化。 Was going to use padr package for that, but it prefers to have a datetime or date . 打算使用padr包,但它更喜欢使用datetimedate Or might use geom_rect to sidestep that.] 或者可以使用geom_rect来回避它。]

Without either of those, this heatmap is "spotty" b/c it only has stripes where the changes happen and not all the minutes between. 没有其中任何一个,这个热图是“不稳定的”b / c它只有变化发生的条纹而不是所有的分钟。

ggplot(data_long_count, aes(WEEKDAY, TIME_dec, fill = employee_count)) + geom_tile()

I think this should do it 我认为应该这样做

clean_colnames <- function(col_inds) {
  data %>% select(INDEX, day = col_inds[1], start = col_inds[2], end = col_inds[3])
}

bind_rows(clean_colnames(2:4), clean_colnames(5:7), clean_colnames(8:10))  %>% 
  gather(key = start_end, value = time, -INDEX, -day) %>% 
  mutate(time = paste0("20190101 ", time) %>% lubridate::ymd_hm()) %>% 
  padr::pad(group = c("INDEX", "day")) %>% 
  count(day, time) %>% 
  mutate(time = paste0(substr(time, 12, 13), substr(time, 15, 16)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM