简体   繁体   English

如何通过节点或叶子中的 label 折叠 phylog.netic 树中的分支?

[英]How to collapse branches in a phylogenetic tree by the label in their nodes or leaves?

I have built a phylog.netic tree for a protein family that can be split into different groups, classifying each one by its type of receptor or type of response.我为一个蛋白质家族构建了一个系统发育树,它可以分为不同的组,根据受体类型或反应类型对每个组进行分类。 The nodes in the tree are labeled as the type of receptor.树中的节点被标记为受体的类型。

In the phylog.netic tree I can see that proteins that belong to the same groups or type of receptor have clustered together in the same branches.在系统发育树中,我可以看到属于同一组或受体类型的蛋白质聚集在同一分支中。 So I would like to collapse these branches that have labels in common, grouping them by a given list of keywords.所以我想折叠这些具有共同标签的分支,按给定的关键字列表对它们进行分组。

The command would be something like this:该命令将是这样的:

./collapse_tree_by_label -f phylog.netic_tree.newick -l list_of_labels_to_collapse.txt -o collapsed_tree.eps(or pdf) ./collapse_tree_by_label -f phylog.netic_tree.newick -l list_of_labels_to_collapse.txt -o collapsed_tree.eps(或 pdf)

My list_of_labels_to_collapse.txt would be like this: AB C D我的 list_of_labels_to_collapse.txt 会是这样的:AB C D

My newick tree would be like this: (A_1:0.05,A_2:0.03,A_3:0.2,A_4:0.1):0.9,(((B_1:0.05,B_2:0.02,B_3:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2)我的 newick 树是这样的: (A_1:0.05,A_2:0.03,A_3:0.2,A_4:0.1):0.9,(((B_1:0.05,B_2:0.02,B_3:0.04):0.6,(C_1:0.6 ,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2)

The output image without collapsing is like this: http://i.stack.imgur.com/pHkoQ.png没有塌陷的output图像是这样的: http://i.stack.imgur.com/pHkoQ.png

The output image collapsing should be like this (collapsed_tree.eps): http://i.stack.imgur.com/TLXd0.png output 图像折叠应该是这样的(collapsed_tree.eps): http://i.stack.imgur.com/TLXd0.png

The width of the triangles should represent the branch length, and the high of the triangles must represent the number of nodes in the branch.三角形的宽度应该代表分支长度,三角形的高必须代表分支中的节点数。

I have been playing with the "ape" package in R. I was able to plot a phylog.netic tree, but I still can't figure out how to collapse the branches by keywords in the labels:我一直在玩 R 中的“猿”package。我能够 plot 一棵 phylog.netic 树,但我仍然无法弄清楚如何通过标签中的关键字折叠分支:

require("ape")

This will load the tree:这将加载树:

cat("((A_1:0.05,A_2:0.03,A_3:0.2,A_4:0.1):0.9,(((B_1:0.05,B_2:0.02,B_3:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2):0.5);", file = "ex.tre", sep = "\n")
tree.test <- read.tree("ex.tre")

Here should be the code to collapse这里应该是崩溃的代码

This will plot the tree:这将 plot 树:

plot(tree.test)

Your tree as it is stored in R already has the tips stored as polytomies. 存储在R中的树已经存在以polytomies存储的提示。 It's just a matter of plotting the tree with triangles representing the polytomies. 这只是用三角形代表多边形来绘制树的问题。

There is no function in ape to do this, that I am aware of, but if you mess with the plotting function a little bit you can pull it off 我知道没有ape功能,但如果你把绘图功能弄得一点点就可以把它拉下来

# Step 1: make edges for descendent nodes invisible in plot:
groups <- c("A", "B", "C", "D")
group_edges <- numeric(0)
for(group in groups){
  group_edges <- c(group_edges,getMRCA(tree.test,tree.test$tip.label[grepl(group, tree.test$tip.label)]))
}
edge.width <- rep(1, nrow(tree.test$edge))
edge.width[tree.test$edge[,1] %in% group_edges ] <- 0


# Step 2: plot the tree with the hidden edges
plot(tree.test, show.tip.label = F, edge.width = edge.width)

# Step 3: add triangles
add_polytomy_triangle <- function(phy, group){
  root <- length(phy$tip.label)+1
  group_node_labels <- phy$tip.label[grepl(group, phy$tip.label)]
  group_nodes <- which(phy$tip.label %in% group_node_labels)
  group_mrca <- getMRCA(phy,group_nodes)

  tip_coord1 <- c(dist.nodes(phy)[root, group_nodes[1]], group_nodes[1])
  tip_coord2 <- c(dist.nodes(phy)[root, group_nodes[1]], group_nodes[length(group_nodes)])
  node_coord <- c(dist.nodes(phy)[root, group_mrca], mean(c(tip_coord1[2], tip_coord2[2])))

  xcoords <- c(tip_coord1[1], tip_coord2[1], node_coord[1])
  ycoords <- c(tip_coord1[2], tip_coord2[2], node_coord[2])
  polygon(xcoords, ycoords)
}

Then you just have to loop through the groups to add the triangles 然后你只需循环遍历组添加三角形

for(group in groups){
  add_polytomy_triangle(tree.test, group)
}

I think the script is finally doing what I wanted. 我认为剧本终于做了我想要的。 From the answer that @CactusWoman provided, I changed the code a little bit so the script will try to find the MRCA that represents the largest branch that matches to my search pattern. 从@CactusWoman提供的答案中,我稍微更改了代码,因此脚本将尝试找到代表与我的搜索模式匹配的最大分支的MRCA。 This solved the problem of not merging non-polytomic branches, or collapsing the whole tree because one matching node was mistakenly outside the correct branch. 这解决了不合并非多分支分支或折叠整个树的问题,因为一个匹配节点错误地在正确分支之外。

In addition, I included a parameter that represents the limit for the pattern abundance ratio in a given branch, so we can select and collapse/group branches that have at least 90% of its tips matching to the search pattern, for example. 另外,我在参考分支中包含了一个表示模式丰度比限制的参数,因此我们可以选择和折叠/分组至少90%的提示与搜索模式匹配的分支。

library(geiger)
library(phylobase)
library(ape)

#functions
find_best_mrca <- function(phy, group, threshold){

     group_matches <- phy$tip.label[grepl(group, phy$tip.label, ignore.case=TRUE)]
     group_mrca <- getMRCA(phy,phy$tip.label[grepl(group, phy$tip.label, ignore.case=TRUE)])
     group_leaves <- tips(phy, group_mrca)
     match_ratio <- length(group_matches)/length(group_leaves)

      if( match_ratio < threshold){

           #start searching for children nodes that have more than 95% of descendants matching to the search pattern
           mrca_children <- descendants(as(phy,"phylo4"), group_mrca, type="all")
           i <- 1
           new_ratios <- NULL
           nleaves <- NULL
           names(mrca_children) <- NULL

           for(new_mrca in mrca_children){
                child_leaves <- tips(tree.test, new_mrca)
                child_matches <- grep(group, child_leaves, ignore.case=TRUE)
                new_ratios[i] <- length(child_matches)/length(child_leaves)
                nleaves[i] <- length(tips(phy, new_mrca))
                i <- i+1
           }



           match_result <- data.frame(mrca_children, new_ratios, nleaves)


           match_result_sorted <- match_result[order(-match_result$nleaves,match_result$new_ratios),]
           found <- numeric(0);

           print(match_result_sorted)

           for(line in 1:nrow(match_result_sorted)){
                 if(match_result_sorted$ new_ratios[line]>=threshold){
                     return(match_result_sorted$mrca_children[line])
                     found <- 1
                 }

           }

           if(found==0){return(found)}
      }else{return(group_mrca)}




}

add_triangle <- function(phy, group,phylo_plot){

     group_node_labels <- phy$tip.label[grepl(group, phy$tip.label)]
     group_mrca <- getMRCA(phy,group_node_labels)
     group_nodes <- descendants(as(tree.test,"phylo4"), group_mrca, type="tips")
     names(group_nodes) <- NULL

     x<-phylo_plot$xx
     y<-phylo_plot$yy


     x1 <- max(x[group_nodes])
     x2 <-max(x[group_nodes])
     x3 <- x[group_mrca]

     y1 <- min(y[group_nodes])
     y2 <- max(y[group_nodes])
     y3 <-  y[group_mrca]

     xcoords <- c(x1,x2,x3)
     ycoords <- c(y1,y2,y3)

     polygon(xcoords, ycoords)

     return(c(x2,y3))

}



#main

  cat("((A_1:0.05,E_2:0.03,A_3:0.2,A_4:0.1,A_5:0.1,A_6:0.1,A_7:0.35,A_8:0.4,A_9:01,A_10:0.2):0.9,((((B_1:0.05,B_2:0.05):0.5,B_3:0.02,B_4:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2):0.5);", file = "ex.tre", sep = "\n")
tree.test <- read.tree("ex.tre")


# Step 1: Find the best MRCA that matches to the keywords or search patten

groups <- c("A", "B|C", "D")
group_labels <- groups

group_edges <- numeric(0)
edge.width <- rep(1, nrow(tree.test$edge))
count <- 1


for(group in groups){

    best_mrca <- find_best_mrca(tree.test, group, 0.90)

    group_leaves <- tips(tree.test, best_mrca)

    groups[count] <- paste(group_leaves, collapse="|")
    group_edges <- c(group_edges,best_mrca)

    #Step2: Remove the edges of the branches that will be collapsed, so they become invisible
    edge.width[tree.test$edge[,1] %in% c(group_edges[count],descendants(as(tree.test,"phylo4"), group_edges[count], type="all")) ] <- 0
    count = count +1

}


#Step 3: plot the tree hiding the branches that will be collapsed/grouped

last_plot.phylo <- plot(tree.test, show.tip.label = F, edge.width = edge.width)

#And save a copy of the plot so we can extract the xy coordinates of the nodes
#To get the x & y coordinates of a plotted tree created using plot.phylo
#or plotTree, we can steal from inside tiplabels:
last_phylo_plot<-get("last_plot.phylo",envir=.PlotPhyloEnv)

#Step 4: Add triangles and labels to the collapsed nodes
for(i in 1:length(groups)){

  text_coords <- add_triangle(tree.test, groups[i],last_phylo_plot)

  text(text_coords[1],text_coords[2],labels=group_labels[i], pos=4)

}

I've also been searching for this kind of tool for ages, not so much for collapsing categorical groups, but for collapsing internal nodes based on a numerical support value. 我也一直在寻找这种工具,不仅仅是为了折叠分类组,而是为了根据数值支持值折叠内部节点。

The di2multi function in the ape package can collapse nodes to polytomies, but it currently can only does this by branch length threshold. ape包中的di2multi函数可以将节点折叠为polytomies,但它目前只能通过分支长度阈值来实现。 Here is a rough adaptation that allows collapsing by a node support value threshold instead (default threshold = 0.5). 这是一个粗略的适应,允许通过节点支持值阈值折叠(默认阈值= 0.5)。

Use at your own risk, but it works for me on my rooted Bayesian tree. 使用风险自负,但它适用于我的根贝叶树。

di2multi4node <- function (phy, tol = 0.5) 
  # Adapted di2multi function from the ape package to plot polytomies
  # based on numeric node support values
  # (di2multi does this based on edge lengths)
  # Needs adjustment for unrooted trees as currently skips the first edge
{
  if (is.null(phy$edge.length)) 
    stop("the tree has no branch length")
  if (is.na(as.numeric(phy$node.label[2])))
    stop("node labels can't be converted to numeric values")
  if (is.null(phy$node.label))
    stop("the tree has no node labels")
  ind <- which(phy$edge[, 2] > length(phy$tip.label))[as.numeric(phy$node.label[2:length(phy$node.label)]) < tol]
  n <- length(ind)
  if (!n) 
    return(phy)
  foo <- function(ancestor, des2del) {
    wh <- which(phy$edge[, 1] == des2del)
    for (k in wh) {
      if (phy$edge[k, 2] %in% node2del) 
        foo(ancestor, phy$edge[k, 2])
      else phy$edge[k, 1] <<- ancestor
    }
  }
  node2del <- phy$edge[ind, 2]
  anc <- phy$edge[ind, 1]
  for (i in 1:n) {
    if (anc[i] %in% node2del) 
      next
    foo(anc[i], node2del[i])
  }
  phy$edge <- phy$edge[-ind, ]
  phy$edge.length <- phy$edge.length[-ind]
  phy$Nnode <- phy$Nnode - n
  sel <- phy$edge > min(node2del)
  for (i in which(sel)) phy$edge[i] <- phy$edge[i] - sum(node2del < 
                                                           phy$edge[i])
  if (!is.null(phy$node.label)) 
    phy$node.label <- phy$node.label[-(node2del - length(phy$tip.label))]
  phy
}

This is my answer based on phytools::phylo.toBackbone function, see http://blog.phytools.org/2013/09/even-more-on-plotting-subtrees-as.html , and http://blog.phytools.org/2013/10/finding-edge-lengths-of-all-terminal.html . 这是我基于phytools::phylo.toBackbone函数的答案,请参阅http://blog.phytools.org/2013/09/even-more-on-plotting-subtrees-as.htmlhttp:// blog。 phytools.org/2013/10/finding-edge-lengths-of-all-terminal.html First, load the function at the end of code. 首先,在代码末尾加载函数。

library(ape)
library(phytools)  #phylo.toBackbone
library(phangorn) 

cat("((A_1:0.05,E_2:0.03,A_3:0.2,A_4:0.1,A_5:0.1,A_6:0.1,A_7:0.35,A_8:0.4,A_9:01,A_10:0.2):0.9,((((B_1:0.05,B_2:0.05):0.5,B_3:0.02,B_4:0.04):0.6,(C_1:0.6,C_2:0.08):0.7):0.5,(D_1:0.3,D_2:0.4,D_3:0.5,D_4:0.7,D_5:0.4):1.2):0.5);", file = "ex.tre", sep = "\n")

phy <- read.tree("ex.tre")
groups <- c("A", "B|C", "D") 

backboneoftree<-makebackbone(groups,phy)
#   tip.label clade.label  N     depth
# 1       A_1           A 10 0.2481818
# 2       B_1         B|C  6 0.9400000
# 3       D_1           D  5 0.4600000

par(mfrow=c(2,2), mar=c(0,2,2,0) )
plot(phy, main="Complete tree" )
plot(backboneoftree)

makebackbone<-function(groupings,phy){ 
  listofspecies<-phy$tip.label
  listtopreserve<-list()
  lengthofclades<-list()
  meandistnode<-list()
  newedgelengths<-list()
  for (i in 1:length(groupings)){
    groupings<-groups
    bestmrca<-getMRCA(phy,grep(groupings[i], phy$tip.label) )
    mrcatips<-phy$tip.label[unlist(phangorn::Descendants(phy,bestmrca, type="tips") )]
    listtopreserve[i]<- mrcatips[1]
    meandistnode[i]<- mean(dist.nodes(phy)[unlist(lapply(mrcatips,  
    function(x) grep(x, phy$tip.label) ) ),bestmrca] )
    lengthofclades[i]<-length(mrcatips)
    provtree<-drop.tip(phy,mrcatips, trim.internal=F, subtree = T)
    n3<-length(provtree$tip.label)
    newedgelengths[i]<-setNames(provtree$edge.length[sapply(1:n3,function(x,y) 
      which(y==x),y=provtree$edge[,2])],
      provtree$tip.label)[provtree$tip.label[grep("tips",provtree$tip.label)] ]
  }  
  newtree<-drop.tip(phy,setdiff(listofspecies,unlist(listtopreserve)), 
                    trim.internal = T)
  n<-length(newtree$tip.label)
  newtree$edge.length[sapply(1:n,function(x,y) 
    which(y==x),y=newtree$edge[,2])]<-unlist(newedgelengths)+unlist(meandistnode)
  trans<-data.frame(tip.label=newtree$tip.label,clade.label=groupings,
                    N=unlist(lengthofclades), depth=unlist(meandistnode) )
  rownames(trans)<-NULL
  print(trans)
  backboneoftree<-phytools::phylo.toBackbone(newtree,trans)
  return(backboneoftree)
}

在此输入图像描述

EDIT: I haven't tried this, but it might be another answer: "Script and function to transform the tip branches of a tree , ie the thickness or to triangles, with the width of both correlating with certain parameters (eg, species number of the clade) (tip.branches.R)" http://www.sysbot.biologie.uni-muenchen.de/en/people/cusimano/use_r.html http://www.sysbot.biologie.uni-muenchen.de/en/people/cusimano/tip.branches.R 编辑:我没有试过这个,但它可能是另一个答案:“脚本和函数来转换树的尖端分支,即厚度或三角形,两者的宽度与某些参数相关(例如,物种数量) (分支)(tip.branches.R)“ http://www.sysbot.biologie.uni-muenchen.de/en/people/cusimano/use_r.html http://www.sysbot.biologie.uni-muenchen由Matchi.com提供回到/ EN /人/ cusimano / tip.branches.R

This doesn't address depicting the clades as triangles, but it does help with collapsing low-support nodes.这并没有解决将进化枝描述为三角形的问题,但它确实有助于折叠低支持节点。 The library ggtree has a function as.polytomy which can be used to collapse nodes based on support values.ggtree有一个 function as.polytomy可用于根据支持值折叠节点。

For example, to collapse bootstraps less than 50%, you'd use:例如,要将 bootstraps 折叠到 50% 以下,您可以使用:

polytree = as.polytomy(raxtree, feature='node.label', fun=function(x) as.numeric(x) < 50)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM