简体   繁体   中英

ggplot geom_tile adds an additional, unwanted column at start and end when x-axis values are not characters

Context

I am trying to generate a heatmap for the evaluation of an exercise on process scheduling. The rows (y-axis) correspond to processes and the columns (x-axis) to the current time step.

The data is available to me as a matrix corresponding to what I described above, since geom_tile expects a data frame, I use reshape::melt to transform the data into a data frame with one column denoting the process, one the current timestep and one the value (see below). | Process | Timestep | value | | ---------- | --------- | ----- | | Scheduler | 1 | 1 | | 4 | 1 | 13 | | 3 | 1 | 13 | | 2 | 1 | 13 | | 1 | 1 | 26 | | Scheduler | 2 | 10 | | 4 | 2 | 24 | | 3 | 2 | 84 | | 2 | 2 | 19 | | 1 | 2 | 314 |

[...]

What is going wrong?

Without any interference, the type of df$Timestep is integer .

在使用 ggplot 绘图之前,由 melt 返回的 df 中的时间步长类型

Passing the data frame like this to ggplot, yields the following result:

结果没有任何修改

As you can see, it looks like ggplot adds a column for t=0 and t=32, so before and after the actual first and last timestep:

在此处输入图像描述

This is precisely the problem I would like to tackle.

What have I tried

Change data frame column type

If I change the type of the "Timestep" column in the data frame returned by reshape::melt to character by setting df$Timestep <- as.character(df$Timestep) ggplot gets confused about where to place the columns (seems to not consider natural alphabetical order), making the plot useless.

将 df$Timestep 转换为字符后的结果

Enforce data frame column type "character" during conversion

By prepending "t" to the timesteps, turning them into "t1","t2",... ggplot seems to be able to still understand the natural order of the axis values, but this makes the axis look messy (Timestep and t1,...) and thus not a desireable solution either.

添加一个字符后的结果

Code

This is the code used for drawing the plots that you can see in the images within this post:

drawHeatmap <- function(dataToDraw, solutionToHighlight, plotTitle) {
  rownames(dataToDraw) <- as.character(c("Scheduler", 4:1))

  # Used to prepend a character and enfore df column type that way
  # colnames(dataToDraw) <- paste("t", 1:ncol(dataToDraw), sep="")
  colnames(dataToDraw) <- c(1:ncol(dataToDraw)
  df <- melt(dataToDraw)
  colnames(df) <- c("Process","Timestep","value")
  
  # Modify df types as needed
  # Changing df column type here makes ggplot not know how to deal with the natural ordering
  # df$Timestep <- as.character(df$Timestep)
  df$value <- as.integer(df$value)  # Convert in order to use continuous scale
  
  solution_path <- highlightCorrectPath(t(solutionToHighlight))
 
  ggplot() +
    coord_fixed() +
    scale_fill_gradientn(name = "Occurences", colors = c("blue", "green", "yellow", "orange", "red")) +
    geom_tile(data = df, aes(x=Timestep, y=Process, fill = value)) +
    scale_color_manual(values=c("black"), labels = c("Solution cells")) + 
    geom_rect(aes(xmin = solution_path[,"x_left"],
                  xmax = solution_path[,"x_right"],
                  ymin =  solution_path[,"y_bottom"],
                  ymax =  solution_path[,"y_top"], color = "Solution Cells"),
              fill=NA, size=1.1) +
    labs(title=plotTitle) +
    theme(legend.position = "bottom")
}

Solution

Adding df$Timestep <- factor(df$Timestep) fixed the issue, thanks a lot @stefan for suggesting this in the comments. However, I am still unsure what exactly caused ggplot to behave this way.

For the full code and resulting plot see below.

Code

drawHeatmap <- function(dataToDraw, solutionToHighlight, plotTitle) {
  rownames(dataToDraw) <- as.character(c( "Dispatcher", "Scheduler", 3:1))
  colnames(dataToDraw) <- c(1:ncol(dataToDraw))
  df <- melt(dataToDraw)
  colnames(df) <- c("Process","Timestep","value")
  
  # Modify df types as needed
  df$value <- factor(df$value)
  df$Timestep <- factor(df$Timestep)
  
  plot_colors <- colorRampPalette(c("blue", "green", "yellow", "orange", "red"))(nlevels(df$value))
  plot_breaks <- seq(from = levels(df$value)[1], to = levels(df$value)[nlevels(df$value)], by = 10)
  
  solution_path <- highlightCorrectPath(t(solutionToHighlight))
 
  ggplot() +
    coord_fixed() +
    scale_fill_discrete(name = "Occurences", type = plot_colors, breaks = plot_breaks) +
    geom_tile(data = df, aes(x=Timestep, y=Process, fill = value)) +
    scale_color_manual(values=c("black"), labels = c("Solution cells")) + 
    geom_rect(aes(xmin = solution_path[,"x_left"],
                  xmax = solution_path[,"x_right"],
                  ymin =  solution_path[,"y_bottom"],
                  ymax =  solution_path[,"y_top"], color = "Solution Cells"),
              fill=NA, size=1.1) +
    labs(title=plotTitle) +
    theme(legend.position = "bottom")
}

Plot

将 Timesteps 列转换为因子后绘制

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM