[英]making matrix from dataframe to make heatmap in R
I want to make heatmap in R but I could not change the dataframe to appropriate matrix form.我想在 R 中制作热图,但我无法将数据框更改为适当的矩阵形式。 I have a dataframe with three columns (protein,value and treatments) in columns proteins and treatments I have some repeatation.
我有一个包含三列(蛋白质、值和治疗)的数据框,列蛋白质和治疗我有一些重复。 could you someone help me how can I make appropriate matrix from this data and which package is better to make heatmap.(I have 1000 proteins (some of them are repeated according to various treatments) and 4 group of treatments).
你能帮我如何从这些数据中制作合适的矩阵,哪个包更适合制作热图。(我有 1000 种蛋白质(其中一些根据不同的处理重复)和 4 组处理)。 I am beginner in R and really need your help.
我是 R 的初学者,真的需要你的帮助。 Thank you in advance.
先感谢您。
example:
protein value treatment
EPN1 0.986 treat1
LAMB1 0.881 treat2
PKP4 0.827 treat2
PKP2 0.739 treat3
BAIAP2 0.519 treat2
UTRN 0.502 treat4
REPS2 0.481 treat2
PKP4 0.365 treat1
LAMC1 -0.529 treat2
PPIB 2.86 treat4
You may do either of these 2你可以做这两个
df <- read.table(header = T, text = "protein value treatment
EPN1 0.986 treat1
LAMB1 0.881 treat2
PKP4 0.827 treat2
PKP2 0.739 treat3
BAIAP2 0.519 treat2
UTRN 0.502 treat4
REPS2 0.481 treat2
PKP4 0.365 treat1
LAMC1 -0.529 treat2
PPIB 2.86 treat4 ")
library(tidyverse)
df %>% ggplot(aes(x= treatment, y = protein, fill = value)) +
geom_tile()
OR要么
library(echarts4r)
df |>
e_charts(protein) |>
e_heatmap(treatment, value) |>
e_visual_map(value)
Here is some example data.这是一些示例数据。
dat <- data.frame(
protein=replicate(100, paste(sample(LETTERS, 4), collapse="")),
value=rnorm(100),
treatment=paste0("treat", sample(1:4, 100, replace=TRUE)),
stringsAsFactors=FALSE
)
Using ggplot2
you could do使用
ggplot2
你可以做
library(ggplot2)
plt <- ggplot(dat, aes(treatment, protein, fill=value)) + geom_tile()
More options you can find here: https://www.r-graph-gallery.com/79-levelplot-with-ggplot2.html您可以在此处找到更多选项: https : //www.r-graph-gallery.com/79-levelplot-with-ggplot2.html
However, I don't know how to deal with a lot of proteins to plot (as you mentioned).但是,我不知道如何处理要绘制的大量蛋白质(如您所述)。 Do you need to see the names of the proteins?
您需要查看蛋白质的名称吗?
EDIT: one possibility for 1000 proteins would be to make the chart really long, like so:编辑:1000 种蛋白质的一种可能性是使图表非常长,如下所示:
ggsave(
"long.pdf", plot=plt, device="pdf",
width=21, height=150, units="cm", limitsize=FALSE
)
This creates a PDF in the current folder.这将在当前文件夹中创建一个 PDF。 Using the zoom function of your PDF-Viewer, you can then navigate to the rows of interest.
使用 PDF 查看器的缩放功能,您可以导航到感兴趣的行。
EDIT 2: For more complex charts I (still) rely on base R. But maybe there are some ggplot-Style packages I am not aware of.编辑 2:对于更复杂的图表,我(仍然)依赖于基础 R。但也许有一些我不知道的 ggplot-Style 包。 A base R solution requires to convert the data into a matrix first.
基本的 R 解决方案需要首先将数据转换为矩阵。 One approach would be to use a sparse matrix like this:
一种方法是使用这样的稀疏矩阵:
dim_x <- unique(dat$protein)
dim_y <- unique(dat$treatment)
map_x <- setNames(seq_along(dim_x), dim_x)
map_y <- setNames(seq_along(dim_y), dim_y)
library(Matrix)
mat <- sparseMatrix(
i=map_x[dat$protein], j=map_y[dat$treatment], x=dat$value,
dims=c(length(dim_x), length(dim_y)), dimnames=list(dim_x, dim_y)
)
Then you can use the base R heatmap
function,然后你可以使用基本的 R
heatmap
函数,
heatmap(as.matrix(mat))
or some more customizeable function like或一些更可定制的功能,如
library(pheatmap)
pheatmap(mat)
which both show dendograms.两者都显示树状图。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.