简体   繁体   English

从数据帧制作矩阵以在 R 中制作热图

[英]making matrix from dataframe to make heatmap in R

I want to make heatmap in R but I could not change the dataframe to appropriate matrix form.我想在 R 中制作热图,但我无法将数据框更改为适当的矩阵形式。 I have a dataframe with three columns (protein,value and treatments) in columns proteins and treatments I have some repeatation.我有一个包含三列(蛋白质、值和治疗)的数据框,列蛋白质和治疗我有一些重复。 could you someone help me how can I make appropriate matrix from this data and which package is better to make heatmap.(I have 1000 proteins (some of them are repeated according to various treatments) and 4 group of treatments).你能帮我如何从这些数据中制作合适的矩阵,哪个包更适合制作热图。(我有 1000 种蛋白质(其中一些根据不同的处理重复)和 4 组处理)。 I am beginner in R and really need your help.我是 R 的初学者,真的需要你的帮助。 Thank you in advance.先感谢您。

example:
protein           value    treatment

EPN1              0.986    treat1

LAMB1             0.881    treat2

PKP4              0.827    treat2     
PKP2              0.739    treat3     
BAIAP2            0.519    treat2     
UTRN              0.502    treat4     
REPS2             0.481    treat2     
PKP4              0.365    treat1      
LAMC1            -0.529    treat2     
PPIB              2.86     treat4   

You may do either of these 2你可以做这两个

df <- read.table(header = T, text = "protein           value    treatment
EPN1              0.986    treat1
LAMB1             0.881    treat2
PKP4              0.827    treat2     
PKP2              0.739    treat3     
BAIAP2            0.519    treat2     
UTRN              0.502    treat4     
REPS2             0.481    treat2     
PKP4              0.365    treat1      
LAMC1            -0.529    treat2     
PPIB              2.86     treat4 ")

library(tidyverse)
df %>% ggplot(aes(x= treatment, y = protein, fill = value)) +
  geom_tile()


OR要么

library(echarts4r)

df |> 
  e_charts(protein) |> 
  e_heatmap(treatment, value) |> 
  e_visual_map(value)

在此处输入图片说明

Here is some example data.这是一些示例数据。

dat <- data.frame(
    protein=replicate(100, paste(sample(LETTERS, 4), collapse="")),
    value=rnorm(100),
    treatment=paste0("treat", sample(1:4, 100, replace=TRUE)),
    stringsAsFactors=FALSE
)

Using ggplot2 you could do使用ggplot2你可以做

library(ggplot2)
plt <- ggplot(dat, aes(treatment, protein, fill=value)) + geom_tile()

More options you can find here: https://www.r-graph-gallery.com/79-levelplot-with-ggplot2.html您可以在此处找到更多选项: https : //www.r-graph-gallery.com/79-levelplot-with-ggplot2.html

However, I don't know how to deal with a lot of proteins to plot (as you mentioned).但是,我不知道如何处理要绘制的大量蛋白质(如您所述)。 Do you need to see the names of the proteins?您需要查看蛋白质的名称吗?

EDIT: one possibility for 1000 proteins would be to make the chart really long, like so:编辑:1000 种蛋白质的一种可能性是使图表非常长,如下所示:

ggsave(
    "long.pdf", plot=plt, device="pdf", 
    width=21, height=150, units="cm", limitsize=FALSE
)

This creates a PDF in the current folder.这将在当前文件夹中创建一个 PDF。 Using the zoom function of your PDF-Viewer, you can then navigate to the rows of interest.使用 PDF 查看器的缩放功能,您可以导航到感兴趣的行。

EDIT 2: For more complex charts I (still) rely on base R. But maybe there are some ggplot-Style packages I am not aware of.编辑 2:对于更复杂的图表,我(仍然)依赖于基础 R。但也许有一些我不知道的 ggplot-Style 包。 A base R solution requires to convert the data into a matrix first.基本的 R 解决方案需要首先将数据转换为矩阵。 One approach would be to use a sparse matrix like this:一种方法是使用这样的稀疏矩阵:

dim_x <- unique(dat$protein)
dim_y <- unique(dat$treatment)
map_x <- setNames(seq_along(dim_x), dim_x)
map_y <- setNames(seq_along(dim_y), dim_y)

library(Matrix)
mat <- sparseMatrix(
    i=map_x[dat$protein], j=map_y[dat$treatment], x=dat$value, 
    dims=c(length(dim_x), length(dim_y)), dimnames=list(dim_x, dim_y)
)

Then you can use the base R heatmap function,然后你可以使用基本的 R heatmap函数,

heatmap(as.matrix(mat))

or some more customizeable function like或一些更可定制的功能,如

library(pheatmap)
pheatmap(mat)

which both show dendograms.两者都显示树状图。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM