简体   繁体   中英

making matrix from dataframe to make heatmap in R

I want to make heatmap in R but I could not change the dataframe to appropriate matrix form. I have a dataframe with three columns (protein,value and treatments) in columns proteins and treatments I have some repeatation. could you someone help me how can I make appropriate matrix from this data and which package is better to make heatmap.(I have 1000 proteins (some of them are repeated according to various treatments) and 4 group of treatments). I am beginner in R and really need your help. Thank you in advance.

example:
protein           value    treatment

EPN1              0.986    treat1

LAMB1             0.881    treat2

PKP4              0.827    treat2     
PKP2              0.739    treat3     
BAIAP2            0.519    treat2     
UTRN              0.502    treat4     
REPS2             0.481    treat2     
PKP4              0.365    treat1      
LAMC1            -0.529    treat2     
PPIB              2.86     treat4   

You may do either of these 2

df <- read.table(header = T, text = "protein           value    treatment
EPN1              0.986    treat1
LAMB1             0.881    treat2
PKP4              0.827    treat2     
PKP2              0.739    treat3     
BAIAP2            0.519    treat2     
UTRN              0.502    treat4     
REPS2             0.481    treat2     
PKP4              0.365    treat1      
LAMC1            -0.529    treat2     
PPIB              2.86     treat4 ")

library(tidyverse)
df %>% ggplot(aes(x= treatment, y = protein, fill = value)) +
  geom_tile()


OR

library(echarts4r)

df |> 
  e_charts(protein) |> 
  e_heatmap(treatment, value) |> 
  e_visual_map(value)

在此处输入图片说明

Here is some example data.

dat <- data.frame(
    protein=replicate(100, paste(sample(LETTERS, 4), collapse="")),
    value=rnorm(100),
    treatment=paste0("treat", sample(1:4, 100, replace=TRUE)),
    stringsAsFactors=FALSE
)

Using ggplot2 you could do

library(ggplot2)
plt <- ggplot(dat, aes(treatment, protein, fill=value)) + geom_tile()

More options you can find here: https://www.r-graph-gallery.com/79-levelplot-with-ggplot2.html

However, I don't know how to deal with a lot of proteins to plot (as you mentioned). Do you need to see the names of the proteins?

EDIT: one possibility for 1000 proteins would be to make the chart really long, like so:

ggsave(
    "long.pdf", plot=plt, device="pdf", 
    width=21, height=150, units="cm", limitsize=FALSE
)

This creates a PDF in the current folder. Using the zoom function of your PDF-Viewer, you can then navigate to the rows of interest.

EDIT 2: For more complex charts I (still) rely on base R. But maybe there are some ggplot-Style packages I am not aware of. A base R solution requires to convert the data into a matrix first. One approach would be to use a sparse matrix like this:

dim_x <- unique(dat$protein)
dim_y <- unique(dat$treatment)
map_x <- setNames(seq_along(dim_x), dim_x)
map_y <- setNames(seq_along(dim_y), dim_y)

library(Matrix)
mat <- sparseMatrix(
    i=map_x[dat$protein], j=map_y[dat$treatment], x=dat$value, 
    dims=c(length(dim_x), length(dim_y)), dimnames=list(dim_x, dim_y)
)

Then you can use the base R heatmap function,

heatmap(as.matrix(mat))

or some more customizeable function like

library(pheatmap)
pheatmap(mat)

which both show dendograms.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM