[英]How to append bootstrapped values of cluster's (tree) nodes in NEWICK format in R
I want to make a tree (cluster) using Interactive Tree of Life web-based tool (iTOL). 我想使用交互式生命之树基于Web的工具 (iTOL)创建一个树(集群)。 As an input file (or string) this tool uses Newick format which is a way of representing graph-theoretical trees with edge lengths using parentheses and commas.
作为输入文件(或字符串),此工具使用Newick格式 ,这是一种使用括号和逗号表示边长的图理论树的方法。 Beside that, additional information might be supported such as bootstrapped values of cluster's nodes.
除此之外,可能还支持其他信息,例如群集节点的引导值 。
For example, here I created dataset for a cluster analysis using clusterGeneration
package: 例如,在这里我使用
clusterGeneration
包为集群分析创建了数据集 :
library(clusterGeneration)
set.seed(1)
tmp1 <- genRandomClust(numClust=3, sepVal=0.3, numNonNoisy=5,
numNoisy=3, numOutlier=5, numReplicate=2, fileName="chk1")
data <- tmp1$datList[[2]]
Afterwards I performed cluster analysis and assessed support for the cluster's nodes by bootstrap using pvclust
package: 之后,我执行了集群分析,并使用
pvclust
软件包通过bootstrap 评估了对集群节点的支持 :
set.seed(2)
y <- pvclust(data=data,method.hclust="average",method.dist="correlation",nboot=100)
plot(y)
Here is the cluster and bootstrapped values: 这是集群和引导的值:
In order to make a Newick file , I used ape
package: 为了制作一个Newick文件 ,我使用了
ape
包:
library(ape)
yy<-as.phylo(y$hclust)
write.tree(yy,digits=2)
write.tree
function will print tree in a Newick format: write.tree
函数将以Newick格式打印树:
((x2:0.45,x6:0.45):0.043,((x7:0.26,(x4:0.14,(x1:0.14,x3:0.14):0.0064):0.12):0.22,(x5:0.28,x8:0.28):0.2):0.011); ((X2:0.45,5233:0.45):0.043,((X7:0.26,(X4:0.14,(X1:0.14,X3:0.14):0.0064):0.12):0.22,(X5:0.28,X8:0.28 ):0.2):0.011);
Those numbers represent branch lengths (cluster's edge lengths). 这些数字代表分支长度 (簇的边长)。 Following instructions from iTOL help page ("Uploading and working with your own trees" section) I manually added bootstrapped values into my Newick file (bolded values below):
按照iTOL帮助页面的说明 (“上传并使用您自己的树”部分),我手动将自举值添加到我的Newick文件中(下面的粗体值):
((x2:0.45,x6:0.45) 74 :0.043,((x7:0.26,(x4:0.14,(x1:0.14,x3:0.14) 55 :0.0064) 68 :0.12) 100 :0.22,(x5:0.28,x8:0.28) 100 :0.2) 63 :0.011); ((x2:0.45,x6:0.45) 74 :0.043,((x7:0.26,(x4:0.14,(x1:0.14,x3:0.14) 55 :0.0064) 68 :0.12) 100 :0.22,(x5:0.28) ,x8:0.28) 100 :0.2) 63 :0.011);
It works fine when I upload the string into iTOL. 当我将字符串上传到iTOL时,它工作正常。 However, I have a huge cluster and doing it by hand seems tedious...
但是,我有一个巨大的集群,手工做这似乎很乏味......
Bootstrap values can be obtained by: Bootstrap值可以通过以下方式获得:
(round(y$edges,2)*100)[,1:2]
Branch lengths used to form Newick file can be obtained by: 用于形成Newick文件的分支长度可以通过以下方式获得:
yy$edge.length
I tried to figure out how write.tree
function works after debugging it. 我试着弄清楚
write.tree
函数在调试之后是如何工作的。 However, I noticed that it internally calls function .write.tree2
and I couldn't understand how to efficiently change the original code and obtain bootstrapped values in appropriate position in a Newick file. 但是,我注意到它在内部调用函数
.write.tree2
,我无法理解如何有效地更改原始代码并获取Newick文件中适当位置的引导值。
Any suggestion are welcome. 欢迎任何建议。
Here is one solution for you: objects of class phylo
have an available slot called node.label
that, appropriately, gives you the label of a node. 下面是一个解决方案:类
phylo
对象有一个名为node.label
的可用插槽,它适当地为您提供节点的标签。 You can use it to store your bootstrap values. 您可以使用它来存储引导值。 There will be written in your Newick File at the appropriate place as you can see in the code of
.write.tree2
: 正如您在
.write.tree2
的代码中看到的那样,将在适当的位置写入您的Newick文件:
> .write.tree2
function (phy, digits = 10, tree.prefix = "")
{
brl <- !is.null(phy$edge.length)
nodelab <- !is.null(phy$node.label)
...
if (is.null(phy$root.edge)) {
cp(")")
if (nodelab)
cp(phy$node.label[1])
cp(";")
}
else {
cp(")")
if (nodelab)
cp(phy$node.label[1])
cp(":")
cp(sprintf(f.d, phy$root.edge))
cp(";")
}
...
The real difficulty is to find the proper order of the nodes. 真正的困难是找到节点的正确顺序。 I searched and searched but couldn't find a way to find the right order a posteriori .... so that means we will have to get that information during the transformation from an object of class
hclust
to an object of class phylo
. 我找啊找,但无法找到一个方法来找到正确的顺序事后 ....所以这意味着我们将不得不摆脱类的一个对象的转变过程中的信息
hclust
类的一个对象phylo
。
And luckily, if you look into the function as.phylo.hclust
, there is a vector containing the nodes index in their correct order vis-à-vis the previous hclust
object: 幸运的是,如果你查看函数
as.phylo.hclust
,有一个向量包含节点索引,它们的顺序与前一个hclust
对象相比hclust
:
> as.phylo.hclust
function (x, ...)
{
N <- dim(x$merge)[1]
edge <- matrix(0L, 2 * N, 2)
edge.length <- numeric(2 * N)
node <- integer(N) #<-This one
...
Which means we can make our own as.phylo.hclust
with a nodenames
parameter as long as it is in the same order as the nodes in the hclust
object (which is the case in your example since pvclust
keeps a coherent order internally, ie the order of the nodes in the hclust is the same as in the table in which you picked the bootstraps): 这意味着我们可以使用
nodenames
参数创建自己的as.phylo.hclust
,只要它与hclust
对象中的节点的顺序相同(在您的示例中就是这种情况,因为pvclust
在内部保持连贯的顺序,即hclust中节点的顺序与您选择bootstraps的表中的顺序相同):
# NB: in the following function definition I only modified the commented lines
as.phylo.hclust.with.nodenames <- function (x, nodenames, ...) #We add a nodenames argument
{
N <- dim(x$merge)[1]
edge <- matrix(0L, 2 * N, 2)
edge.length <- numeric(2 * N)
node <- integer(N)
node[N] <- N + 2L
cur.nod <- N + 3L
j <- 1L
for (i in N:1) {
edge[j:(j + 1), 1] <- node[i]
for (l in 1:2) {
k <- j + l - 1L
y <- x$merge[i, l]
if (y > 0) {
edge[k, 2] <- node[y] <- cur.nod
cur.nod <- cur.nod + 1L
edge.length[k] <- x$height[i] - x$height[y]
}
else {
edge[k, 2] <- -y
edge.length[k] <- x$height[i]
}
}
j <- j + 2L
}
if (is.null(x$labels))
x$labels <- as.character(1:(N + 1))
node.lab <- nodenames[order(node)] #Here we define our node labels
obj <- list(edge = edge, edge.length = edge.length/2, tip.label = x$labels,
Nnode = N, node.label = node.lab) #And you put them in the final object
class(obj) <- "phylo"
reorder(obj)
}
In the end, here is how you would use this new function in your case study: 最后,您将在案例研究中使用此新功能:
bootstraps <- (round(y$edges,2)*100)[,1:2]
yy<-as.phylo.hclust.with.nodenames(y$hclust, nodenames=bootstraps[,2])
write.tree(yy,tree.names=TRUE,digits=2)
[1] "((x5:0.27,x8:0.27)100:0.24,((x7:0.25,(x4:0.14,(x1:0.13,x3:0.13)61:0.014)99:0.11)100:0.23,(x2:0.46,x6:0.46)56:0.022)61:0.027)100;"
#See the bootstraps ^^^ here for instance
plot(yy,show.node.label=TRUE) #To show that the order is correct
plot(y) #To compare with (here I used the yellow value)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.