简体   繁体   中英

Heatmap with categorical variables and with phylogenetic tree in R

:)

I have a question and did not find any answer by personal search. I would like to make a heatmap with categorical variables (a bit like this one: heatmap-like plot, but for categorical variables ), and I would like to add on the left side a phylogenetic tree (like this one : how to create a heatmap with a fixed external hierarchical cluster ). The ideal would be to adapt the second one since it looks much prettier! ;)

Here is my data:

  • a newick-formatted phylogenetic tree, with 3 species, let's say:

     ((1,2),3); 
  • a data frame:

     x<-c("species 1","species 2","species 3") y<-c("A","A","C") z<-c("A","B","A") df<- data.frame(x,y,z) 

(with A, B and C being the categorical variables, for instance in my case presence/absence/duplicated gene).

Would you know how to do it?

Many thanks in advance!


EDIT: I would like to be able to choose the color of each of the categories in the heatmap, not a classic gradation. Let's say A=green, B=yellow, C=red

I actually figured it out by myself. For those that are interested, here is my script:

#load packages
library("ape")
library(gplots)

#retrieve tree in newick format with three species
mytree <- read.tree("sometreewith3species.tre")
mytree_brlen <- compute.brlen(mytree, method="Grafen") #so that branches have all same length


#turn the phylo tree to a dendrogram object
hc <- as.hclust(mytree_brlen) #Compulsory step as as.dendrogram doesn't have a     method for phylo objects.
dend <- as.dendrogram(hc)
plot(dend, horiz=TRUE) #check dendrogram face

#create a matrix with values of each category for each species
a<-mytree_brlen$tip
b<-c("gene1","gene2")
list<-list(a,b)
values<-c(1,2,1,1,3,2)  #some values for the categories (1=A, 2=B, 3=C)
mat <- matrix(values,nrow=3, dimnames=list) #Some random data to plot

#plot the hetmap
heatmap.2(mat, Rowv=dend, Colv=NA, dendrogram='row',col =
        colorRampPalette(c("red","green","yellow"))(3), 
          sepwidth=c(0.01,0.02),sepcolor="black",colsep=1:ncol(mat),rowsep=1:nrow(mat),
      key=FALSE,trace="none",
      cexRow=2,cexCol=2,srtCol=45,
      margins=c(10,10),
      main="Gene presence, absence and duplication in three species")


#legend of heatmap
par(lend=2)           # square line ends for the color legend
legend("topright",      # location of the legend on the heatmap plot
   legend = c("gene absence", "1 copy of the gene", "2 copies"), # category  labels
   col = c("red", "green", "yellow"),  # color key
   lty= 1,             # line style
   lwd = 15            # line width
)

and here is the resulting figure :) 在此处输入图片说明

I am trying to use your same syntax and the R packages ape, gplots and RColorsBrewer to make a heatmap whose column dendrogram is esssentially a species tree.

But I am unable to proceed beyond reading in my tre file. There are various errors when trying to perform any of the following operations on the tree file read in: a) plot, or b) compute.brlen, and c) plot, after collapse.singles, looks totally mangled in terms of species tree topology

I suspect there is something wrong with my tre input, but not sure what is. Would you happen to understand what is wrong and how I could fix it? Thank you!

(((((((((((((Mt3.5v5, Mt4.0v1), Car), (((Pvu186, Pvu218), (Gma109, Gma189)), Cca))), (((Ppe139, Mdo196), Fve226), Csa122)), ((((((((Ath167, Aly107), Cru183), (Bra197, Tha173)), Cpa113), (Gra221, Tca233)), (Csi154, (Ccl165, Ccl182))), ((Mes147, Rco119),(Lus200, (Ptr156, Ptr210)))), Egr201)), Vvi145), ((Stu206, Sly225), Mgu140)), Aco195), (((Sbi79, Zma181),(Sit164, Pvi202)), (Osa193, Bdi192))), Smo91), Ppa152), (((Cre169, Vca199), Csu227), ((Mpu228, Mpu229), Olu231)));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM