简体   繁体   English

具有多个文件绘图的复杂热图

[英]Complexheatmap with multiple files plotting

I would like to use Complexheatmap for multiple files for plotting individual data frame or files .我想为多个文件使用 Complexheatmap 来绘制单个数据框或文件。

So far I was able to do this as for small subset of files.到目前为止,我能够对一小部分文件执行此操作。

Reading files as list以列表形式读取文件

list_of_files <- list.files('Model_hmap/',pattern = '\\.txt$', full.names = TRUE)


#Further arguments to read.csv can be passed in ...
#all_csv <- lapply(list_of_files,read_delim,delim = "\t", escape_double = FALSE,trim_ws = TRUE)

all_csv <- lapply(list_of_files,read.table,strip.white = FALSE,check.names = FALSE,header=TRUE,row.names=1)
#my_names = c("gene","baseMean","log2FoldChange","lfcSE","stat","pvalue","padj","UP_DOWN")
my_names = c("Symbol","baseMean","log2FoldChange","lfcSE","stat","pvalue","padj","UP_DOWN")

#my_names = c['X2']

#my_names = c("Peak","annotation","ENSEMBL","log2FoldChange","padj","UP_DOWN")
result_abd = lapply(all_csv, FUN = function(x) subset(x, select=-c(1:7,155)))





names(result_abd) <- gsub(".txt","",
                          list.files("Model_hmap/",full.names = FALSE),
                          fixed = TRUE)

Then Scaling the data然后缩放数据

fun <- function(result_abd) {
  p <- t(scale(t(result_abd[,1:ncol(result_abd)])))
}

p2 <- mapply(fun, result_abd, SIMPLIFY = FALSE)

Next step was to use the metadata which i would like to annotate my heat-map下一步是使用我想注释热图的元数据

My metadata is as such我的元数据就是这样

dput(head(metadata))
structure(list(patient = c("TCGA-AB-2856", "TCGA-AB-2849", "TCGA-AB-2971", 
"TCGA-AB-2930", "TCGA-AB-2891", "TCGA-AB-2872"), prior_malignancy = c("no", 
"no", "no", "no", "no", "no"), FAB = c("M4", "M0", "M4", "M2", 
"M1", "M3"), Risk_Cyto = c("Intermediate", "Poor", "Intermediate", 
"Intermediate", "Poor", "Good")), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

To read the above metadata I'm doing this below Im not sure if its the right way or approach.要阅读上面的元数据,我在下面这样做我不确定它是否是正确的方法或方法。

list_of_files1 <- list.files('Model_hmap_meta/',pattern = '\\.txt$', full.names = TRUE)
#Further arguments to read.csv can be passed in ...
meta1 <- lapply(list_of_files1,read.table, row.names = 1,sep = "\t",header = TRUE)

Now I'm stuck at the above step Im not sure how do I pass the argument as list which i have done for the dataframe of my gene expression which I had calculated the zscore which is a list.现在我被困在上面的步骤我不知道如何将参数作为列表传递,这是我为我的基因表达的数据框所做的,我已经计算了 zscore 这是一个列表。 So I think the metadata should be the same class if I have to use this .所以我认为如果我必须使用这个元数据应该是同一个类。

For single file This is how I used to annotation into my final plot对于单个文件这就是我用来注释到最终情节中的方式

metadata =  read_delim("Model_hmap_meta/FAB_table.txt",delim = "\t", escape_double = FALSE, 
                       trim_ws = TRUE)
head(metadata)
dim(metadata)
ann <- data.frame(metadata$FAB, metadata$Risk_Cyto)
colnames(ann) <- c('FAB', 'Risk_Cyto')
colours <- list('FAB' = c('M0' = 'red2', 'M1' = 'royalblue', 'M2'='gold','M3'='forestgreen','M4'='chocolate','M5'='Purple'),
                'Risk_Cyto' = c('Good' = 'limegreen', 'Intermediate' = 'navy' , 'N.D.' ='magenta','Poor'='black'))
colAnn <- HeatmapAnnotation(df = ann,
                            which = 'col',
                            col = colours,
                            annotation_width = unit(c(1, 4), 'cm'),
                            gap = unit(1, 'mm'))

Now this is what I need to pass it to the list if I understand which I'm not able to do现在这就是我需要将其传递给列表的内容,如果我知道我无法做到的话

My plotting function.我的绘图功能。

This is the code I use to plot.这是我用来绘制的代码。

hm1 <- Heatmap(heat,
               col= colorRamp2(c(-2.6,-1,0,1,2.6),c("blue","skyblue","white","lightcoral","red")),

                              #heatmap_legend_param=list(at=c(-2.6,-1,0,1,2.6),color_bar="continuous",
                #                         legend_direction="vertical", legend_width=unit(5,"cm"),
                 #                        title_position="topcenter", title_gp=gpar(fontsize=10, fontface="bold")),
               name = "Z-score",
               
               #Row annotation configurations
               cluster_rows=T,
               show_row_dend=FALSE,
               row_title_side="right",
               row_title_gp=gpar(fontsize=8),
               show_row_names=FALSE,
               row_names_side="left",
               
               #Column annotation configuratiions
               cluster_columns=T,
               show_column_dend=T,
               column_title="DE genes",
               column_title_side="top",
               column_title_gp=gpar(fontsize=15, fontface="bold"),
               show_column_names = FALSE,
               column_names_gp = gpar(fontsize = 12, fontface="bold"),
               
               #Dendrogram configurations: columns
               clustering_distance_columns="euclidean",
               clustering_method_columns="complete",
               column_dend_height=unit(10,"mm"),
               
               #Dendrogram configurations: rows
               clustering_distance_rows="euclidean",
               clustering_method_rows="complete",
               row_dend_width=unit(4,"cm"),
               row_dend_side = "left",
               row_dend_reorder = TRUE,
               
               #Splits
               border=T,
               row_km = 1,
               column_km = 1,
               
               #plot params
               #width = unit(5, "inch"),
               #height = unit(4, "inch"),
               #height = unit(0.4, "cm")*nrow(mat),
               
               #Annotations
               top_annotation = colAnn)

# plot heatmap
draw(hm1, annotation_legend_side = "right", heatmap_legend_side="right")

Objective How do I wrap all the above into a small function where I can take input multiple files and plot them.目标如何将以上所有内容包装成一个小函数,我可以在其中输入多个文件并绘制它们。

UPDATE Data files更新数据文件

My data files my metadafile我的数据文件我的数据文件

Using the code you provided I made the following function ( make_heatmap ).使用您提供的代码,我创建了以下函数( make_heatmap )。 Some of the read in statements are altered to match what I was working with on my machine.一些读入语句被更改以匹配我在我的机器上使用的内容。 I also only used 2 of your files but it should work with all 4 that you're using.我也只使用了你的 2 个文件,但它应该适用于你正在使用的所有 4 个文件。

This function will allow you to pass the counts matrix (which you normalize and set up before passing to the function).此函数将允许您传递计数矩阵(您在传递给函数之前对其进行规范化和设置)。 The assumption is that you're using the same metadata/annotation for each file you're passing.假设您对传递的每个文件使用相同的元数据/注释。 If you have different annotation files you could set up the heatmap annotation before the function and then pass that to the function.如果您有不同的注释文件,您可以在函数之前设置热图注释,然后将其传递给函数。 This is a bit more tedious though.不过这有点乏味。

Usually the way that I set up my heatmap analyzes is that I have a script containing all of my functions (one for each type of heatmap I have to make) and then every time I need to make a new heatmap I have another script where I read in/prepare (ie median center) my counts matrix and then call the heatmap function I need.通常我设置热图分析的方式是我有一个包含我所有功能的脚本(我必须制作的每种类型的热图一个)然后每次我需要制作新的热图时我都有另一个脚本读入/准备(即中值中心)我的计数矩阵,然后调用我需要的热图函数。

list_of_files <- dir(pattern = 'MAP', full.names = TRUE)

#Further arguments to read.csv can be passed in ...
#all_csv <- lapply(list_of_files,read_delim,delim = "\t", escape_double = FALSE,trim_ws = TRUE)

all_csv <- lapply(list_of_files,read.table,strip.white = FALSE,check.names = FALSE,header=TRUE,row.names=1)
#my_names = c("gene","baseMean","log2FoldChange","lfcSE","stat","pvalue","padj","UP_DOWN")
my_names = c("Symbol","baseMean","log2FoldChange","lfcSE","stat","pvalue","padj","UP_DOWN")

#my_names = c['X2']

#my_names = c("Peak","annotation","ENSEMBL","log2FoldChange","padj","UP_DOWN")
result_abd = lapply(all_csv, FUN = function(x) subset(x, select=-c(1:7,155)))

names(result_abd) <- gsub(".txt","",
                          list.files("Model_hmap/",full.names = FALSE),
                          fixed = TRUE)

fun <- function(result_abd) {
  p <- t(scale(t(result_abd[,1:ncol(result_abd)])))
}

p2 <- mapply(fun, result_abd, SIMPLIFY = FALSE)

# list_of_files1 <- list.files('Model_hmap_meta/',pattern = '\\.txt$', full.names = TRUE)
# #Further arguments to read.csv can be passed in ...
# meta1 <- lapply(list_of_files1,read.table, row.names = 1,sep = "\t",header = TRUE)


make_heatmap<-function(counts_matrix){
  
  metadata =  read.table("FAB_table.txt",sep = "\t", header=1)
  
  head(metadata)
  dim(metadata)
  ann <- data.frame(metadata$FAB, metadata$Risk_Cyto)
  colnames(ann) <- c('FAB', 'Risk_Cyto')
  colours <- list('FAB' = c('M0' = 'red2', 'M1' = 'royalblue', 'M2'='gold','M3'='forestgreen','M4'='chocolate','M5'='Purple'),
                  'Risk_Cyto' = c('Good' = 'limegreen', 'Intermediate' = 'navy' , 'N.D.' ='magenta','Poor'='black'))
  colAnn <- HeatmapAnnotation(df = ann,
                              which = 'col',
                              col = colours,
                              annotation_width = unit(c(1, 4), 'cm'),
                              gap = unit(1, 'mm'))
  
  hm1 <- Heatmap(counts_matrix,
                 col= colorRamp2(c(-2.6,-1,0,1,2.6),c("blue","skyblue","white","lightcoral","red")),
                 
                 #heatmap_legend_param=list(at=c(-2.6,-1,0,1,2.6),color_bar="continuous",
                 #                         legend_direction="vertical", legend_width=unit(5,"cm"),
                 #                        title_position="topcenter", title_gp=gpar(fontsize=10, fontface="bold")),
                 name = "Z-score",
                 
                 #Row annotation configurations
                 cluster_rows=T,
                 show_row_dend=FALSE,
                 row_title_side="right",
                 row_title_gp=gpar(fontsize=8),
                 show_row_names=FALSE,
                 row_names_side="left",
                 
                 #Column annotation configuratiions
                 cluster_columns=T,
                 show_column_dend=T,
                 column_title="DE genes",
                 column_title_side="top",
                 column_title_gp=gpar(fontsize=15, fontface="bold"),
                 show_column_names = FALSE,
                 column_names_gp = gpar(fontsize = 12, fontface="bold"),
                 
                 #Dendrogram configurations: columns
                 clustering_distance_columns="euclidean",
                 clustering_method_columns="complete",
                 column_dend_height=unit(10,"mm"),
                 
                 #Dendrogram configurations: rows
                 clustering_distance_rows="euclidean",
                 clustering_method_rows="complete",
                 row_dend_width=unit(4,"cm"),
                 row_dend_side = "left",
                 row_dend_reorder = TRUE,
                 
                 #Splits
                 border=T,
                 row_km = 1,
                 column_km = 1,
                 
                 #plot params
                 #width = unit(5, "inch"),
                 #height = unit(4, "inch"),
                 #height = unit(0.4, "cm")*nrow(mat),
                 
                 #Annotations
                 top_annotation = colAnn)
  
  # plot heatmap
  draw(hm1, annotation_legend_side = "right", heatmap_legend_side="right")
}

make_heatmap(as.matrix(p2[[1]])) #just call the function with the counts matrix
make_heatmap(as.matrix(p2[[2]]))

If you need to output the heatmap to a pdf or something, you can do that before calling the function or you can put that command inside of the heatmap function (just make sure to call dev.off() inside the function too in that case).如果您需要将热图输出为 pdf 或其他内容,您可以在调用函数之前执行此操作,或者您可以将该命令放在热图函数中(在这种情况下,只需确保在函数内部也调用dev.off() )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM