简体   繁体   中英

How to plot data from a 3 columns dataframe as a heatmap plot in R?

I'm new to R and I would appreciate your help. I have a 3 columns df that looks like this:

> head(data)
          V.hit    J.hit  frequency
1 IGHV1-62-3*00 IGHJ2*00 0.51937442
2   IGHV5-17*00 IGHJ3*00 0.18853542
3    IGHV3-5*00 IGHJ1*00 0.09777304
4    IGHV2-9*00 IGHJ3*00 0.03040866
5   IGHV5-12*00 IGHJ4*00 0.02900040
6   IGHV5-12*00 IGHJ2*00 0.00910554

This is just part of the data for example. I want to create a Heat map so that the X-axis will be "V.hit" and the Y-axis will be "J.hit", and the values of the heatmap will be the frequency (im interested of the freq for each combination of V+j). I tried to use this code for the interpolation:

library(akima)
newData <- with(data, interp(x = `V hit`, y = `J hit`, z = frequency))

but I'm getting this error:

Error in interp.old(x, y, z, xo, yo, ncp = 0, extrap = FALSE, duplicate = duplicate,  : 
  missing values and Infs not allowed

so I don't know how to deal with it. I want to achieve this final output:

> head(fld)
# A tibble: 6 x 5
  ...1        `IGHJ1*00` `IGHJ2*00` `IGHJ3*00` `IGHJ4*00`
  <chr>            <dbl>      <dbl>      <dbl>      <dbl>
1 IGHV10-1*00  0.00233     0.00192   NA          0.000512
2 IGHV1-14*00 NA          NA          0.00104   NA       
3 IGHV1-18*00 NA           0.000914  NA         NA       
4 IGHV1-18*00 NA          NA          0.000131  NA       
5 IGHV1-19*00  0.0000131  NA         NA         NA       
6 IGHV1-26*00 NA           0.000214  NA         NA       

while cells that are "NA" will be assigned as "0". And then I'm assuming I will be able to use the heatmap function to create my heat map graph. any help would be really appreciated!

Here is an idea using geom_tile() . Your data is called foo . I created all possible combination of V.hit and J.hit using complete() . For missing values, I asked complete() to use 0 to fill. Then, I used geom_tile() to produce the following graphic. You may want to consider the order of levels, if neccessary.

library(tidyverse)

complete(foo, V.hit, nesting(J.hit), fill = list(frequency = 0)) %>% 
ggplot(aes(x = J.hit, y = V.hit, fill = frequency)) +
geom_tile()

在此处输入图片说明

In base R we could adapt @GregSnow 's solution for a correlation matrix to a frequency heatmap.

First, we cut the vector, say into quartiles (the default in quantile ) and get factor values.

dat$freq.fac <- cut(dat$frequency, quantile(dat$frequency, na.rm=TRUE), include.lowest=T)

Second to prepare the colors, we just copy the factor column and relevel them with builtin heat.colors and a white color for the zero values.

dat <- within(dat, {
  freq.col <- freq.fac
  levels(freq.col) <- c(heat.colors(length(levels(dat$freq.fac)), rev=T), "#FFFFFF")
          })

Third, apply white color to NA s or zero value respectively.

dat$freq.col[is.na(dat$freq.col)] <- "#FFFFFF"
dat$frequency[is.na(dat$frequency)] <- 0

Fourth, apply xtabs and create a color matrix and match colors and levels after.

dat.x <- xtabs(frequency ~ v.hit + j.hit, dat)
col.m <- matrix(dat$freq.col[match(dat$frequency, as.vector(dat.x))], nrow=nrow(dat.x))

Finally plot using rasterImage function.

op <- par(mar=c(.5, 4, 4, 3)+.1)  ## adapt outer margins
plot.new()
plot.window(xlim=c(0, 5), ylim=c(0, 5))
rasterImage(col.m, 0, 1, 5, 5, interpolate=FALSE)
rect(0, 1, 5, 5)  ## frame it with a box
## numbers in the cells
text(col(round(dat.x, 3)) - .5, 5.45 - row(round(dat.x, 3))*.8, round(dat.x, 3))
mtext("Frequency heatmap", 3, 2, font=2, cex=1.2)  ## title
mtext(rownames(dat.x), 2, at=5.45 -(1:5)*.8, las=2)  ## y-axis
mtext(colnames(dat.x), 3, at=(1:5)-.5)  ## y-axis (upper)
## a legend
legend(-.15, .75, legend=c("Frequency:\t", 0, paste("<", seq(.25, 1, .25))), horiz=TRUE, 
      pch=c(NA, rep(22, 5)), col=1, pt.bg=c(NA, levels(dat$freq.col)[c(5, 1:4)]), 
      bty="n", xpd=TRUE, cex=.75, text.font=2)
par(op)  ## reset margins

Yields

在此处输入图片说明


Toy data:

dat <- structure(list(v.hit = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 
        3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 
        4L, 5L), .Label = c("A", "B", "C", "D", "E"), class = "factor"), 
            j.hit = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
            3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L
            ), .Label = c("F", "G", "H", "I", "J"), class = "factor"), 
            frequency = c(NA, NA, 0.717618508264422, NA, NA, 0.777445221319795, 
            NA, 0.212142521282658, 0.651673766085878, 0.125555095961317, 
            NA, 0.386114092543721, 0.0133903331588954, NA, 0.86969084572047, 
            0.34034899668768, 0.482080115471035, NA, 0.493541307048872, 
            0.186217601411045, 0.827373318606988, NA, 0.79423986072652, 
            0.107943625887856, NA)), row.names = c(NA, -25L), class = "data.frame")

You can interpolate with a linear model if the variables correlate.


mdl <- lm(z ~ ., df)

out <- NULL
for(x in seq(min(df$x), max(df$x), (max(df$x) - min(df$x)/100) )){
    tmp <- c()
    for(y in seq(min(df$y), max(df$y), (max(df$y) - min(df$y)/100) )){
        h <- predict(
            mdl,
            data.frame(x = x, y = y)
        )
        tmp = c(tmp, h)
    }
    if(is.null(out)){
        out = as.matrix(tmp)
    }else{
        out = cbind(out, tmp)
    }
}

fig <- plot_ly(z = out, colorscale = "Hot", type = "heatmap")
fig <- fig %>% layout(
    title = "Interpolated Heatmap of Z Given x, y",
    xaxis = list(
        title = "x"
    ),
    yaxis = list(
        title = "y"
    )
)
fig

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM