简体   繁体   中英

Adding a missing factor level to a ggplot2 heatmap

I have a ggplot2-based heatmap that renders counts of the occurrences of certain factors. However, different datasets sometimes don't have instances of some factors, which means that their respective heatmaps will look different. To make side-by-side comparison easier I'd like to add in missing levels. Unfortunately I've not been successful.

So, I have data that looks like this:

> head(numRules)
  Job Generation NumRules
1   0          0        2
2   0          1        1
3   0          2        1
4   0          3        1
5   0          4        1
6   0          5        1
> levels(factor(numRules$NumRules))
[1] "1" "2" "3"

I use the following code to render a nice heatmap that counts the number of rules per generation for all jobs:

ggplot(subset(numRules, Generation < 21), aes(x=Generation, y=factor(NumRules))) + 
   stat_bin(aes(fill=..count..), geom="tile", binwidth=1, position="identity") + 
   ylab('Number of Rules')

Heat map of count of number of rules by generation for all jobs

So the heat map is saying that most of the time the runs only have a single rule for a given generation, but sometimes you get two, and on rare occasions you'll get three.

Now an entirely different set of runs may actually have zero rules for a given generation. However, doing a side-by-side comparison would be a little confusing because the y axis of one heat map has the number of rules in [1,3], and the other might be in [0,2]. What I'd like to do is to standardize the heatmaps so that they all have factor levels in (0,1,2,3) regardless of the number of rules. Eg, I'd like to re-render the above heat map to include a row for zero rules even though there are no instances of that in that particular data frame.

I have battered this with various R incantations involving setting breaks and scales and whatnot to no avail. My intuition is that there is a simple solution to this, yet I'm unable to find it.

Update :

If I manually specify the levels in the call to factor I do get a row added for the zero rules:

ggplot(subset(numRules, Generation < 21), aes(x=Generation, y=factor(NumRules,levels=c("0","1","2","3")))) + stat_bin(aes(fill=..count..), geom="tile", binwidth=1, position="identity") + ylab('Number of Rules')

Which yields this .

Unfortunately, as you can see this new row isn't properly colored. Getting there!

In this situation it would be easier to change your data. First, read your data. Then set variable NumRules to factor with all necessary levels (from 0 to 3)

numRules = read.table(text="  Job Generation NumRules
1   0          0        2
2   0          1        1
3   0          2        1
4   0          3        1
5   0          4        1
6   0          5        1", header=TRUE)

numRules$NumRules = factor(numRules$NumRules, levels=c(0, 1, 2, 3))

Now calculate number of times each combination of NumRules and Generation is present in your data with function table() and save it to some object.

tab<-table(numRules$NumRules,numRules$Generation)
tab

    0 1 2 3 4 5
  0 0 0 0 0 0 0
  1 0 1 1 1 1 1
  2 1 0 0 0 0 0
  3 0 0 0 0 0 0

With function melt() from library reshape2 make this table in long format and change column names

library(reshape2)
tab.long<-melt(tab)
colnames(tab.long)<-c("NumRules","Generation","Count")

Plot the data with new data frame using geom_tile() and setting fill= to column that contains actual counts.

ggplot(tab.long, aes(x=Generation, y=NumRules,fill=Count)) +
    geom_tile() +
  ylab('Number of Rules')

在此处输入图片说明

If all the NumRules you're interested in are levels of the factor, then you can fix this by just specifying drop=FALSE in scale_y_discrete() :

numRules = read.table(text="  Job Generation NumRules
1   0          0        2
2   0          1        1
3   0          2        1
4   0          3        1
5   0          4        1
6   0          5        1", header=TRUE)

numRules$NumRules = factor(numRules$NumRules, levels=c(1, 2, 3))

ggplot(subset(numRules, Generation < 21), aes(x=Generation, y=NumRules)) +
  scale_y_discrete(drop=FALSE) +
  stat_bin(aes(fill=..count..), geom="tile", binwidth=1, position="identity") +
  ylab('Number of Rules')

Result:

显示所有因素

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM