简体   繁体   English

基于R中部分匹配的分类单元标签的系统发育树中的折叠分支

[英]Collapse branches in a phylogenetic tree based on partially matching taxa labels in R

I have built a phylogenetic tree for a DNA bacterial region, where same bacterial species, in general, clustered together in close branches. 我为DNA细菌区域建立了系统发育树,在该区域中,通常相同的细菌物种聚集在紧密的分支中。 Now, I would like to collapse branches, which have labels in common. 现在,我想折叠具有共同标签的分支。 I tried to define labels to collapse based on the following keywords which partially match names of terminal taxa: 我尝试根据以下与终端分类单元名称部分匹配的关键字来定义要折叠的标签:

keywords: 关键字:

("vulneris","ulcerans","blattae","coli","hermannii","albertii","periodonticum","fergusonii")

In R, I upload the following file.newick: 在R中,我上传了以下file.newick:

(((((((((E_vulneris_otu44:0.03924,((E_vulneris_otu97:0.00766,
E_vulneris_otu96:0)0.8:0.00914,E_fergusonii_otu74:0.00725)0:0.0072)0:0,
((E_vulneris_otu95:0,
(((gi_undefined_HMPREF0402_04011_HMPREF0402_04011_E_ulcerans:0,
fig_768594rna24_RO08_01535_E_vulneris:0)0:0.00373,
(gi_undefined_HMPREF1766_00665_HMPREF1766_00665_E_vulneris:0,
fig_768595rna53_CBG60_05850_E_vulneris:0)0:0.00373)0.8:0.00701,
fig_7685910rna43_CI114_11510_E_vulneris:0)0.84:0.00717)0:0,
E_fergusonii_otu78:0.0072)0.85:0.00718)0:0,E_vulneris_otu94:0)0.82:0.00753,
E_vulneris_otu77:0)0.82:0.00698,(E_vulneris_otu93:0,((E_vulneris_otu89:0,
E_vulneris_otu90:0.00754)0:0.00765,E_vulneris_otu91:0)0.83:0.01608)0:0)
0.8:0.02319,(((E_vulneris_otu35:0,E_vulneris_otu34:0.00752)0.83:0.00766,
E_vulneris_otu28:0.00688)0:0,(E_vulneris_otu2:0.01715,E_vulneris_otu1:0)
0.89:0.01482)0.8:0.01541)0.89:0.02013,E_periodonticum_otu73:0)0.75:0.01535,
fig_86016rna55_CTM98_06410_E_periodonticum:0.00831)0.97:0.1808,
((((((E_blattae_otu76:0,E_blattae_otu75:0.01744)0.82:0.00698,
(E_blattae_otu4:0.00771,E_blattae_otu39:0)0.8:0.00762)0:0,
((gi_undefined_HMPREF1540_00319_HMPREF1540_00319_E_vulneris:0,
fig_8616rna58_DXA30_07775_E_ulcerans:0)0.81:0.00724,
gi_undefined_C4N16_02505_E_albertii:0)0.92:0.01676)0.78:0.01261,
E_blattae_otu92:0.004)0.78:0.02469,(((E_coli_otu8:0.01561,
E_coli_otu38:0.00378)0:0.00378,E_coli_otu33:0)0:0,
(((E_coli_otu54:0.00713,gi_undefined_C4N19_02700_E_coli:0)
0.73:0.00675,(((E_coli_otu57:0,E_coli_otu43:0.00715)0.84:0.00715,
E_coli_otu53:0)0.79:0.00852,((((E_coli_otu40:0,
E_coli_otu56:0.0076)0:0.00376,E_coli_otu55:0.00703)0:0.00376,
E_coli_otu37:0)0:0.0028,(E_coli_otu41:0,E_coli_otu4:0.00715)
0.9:0.00714)0:0.00395)0.79:0.00862)0.77:0.00764,E_coli_otu36:0)
0.82:0.00761)0.89:0.04396)0.83:0.0832,(gi_undefined_C4N18_07110_E_blattae:0,
gi_undefined_FUSO3_01390_E_hermannii:0.04598)0.92:0.1457)0.97:0.1015);
tree.test<-read.tree(file = "file.newick")

and build the tree by using ape and phytools packages: 并使用ape和phytools软件包构建树:

ggtree(tree.test) + geom_tiplab()

but I cannot figure out how to collapse at the keyword level. 但我不知道如何在关键字级别上崩溃。 Any suggestions would be very appreciated. 任何建议将不胜感激。 Thank you! 谢谢!

One way to do that would be to drop all the OTUs but one in each species group using the ape::drop.tip function: 一种方法是使用ape::drop.tip函数删除所有OTU,但在每个物种组中删除一个ape::drop.tip

library(ape)

## List of clades
clades <- c("vulneris","ulcerans","blattae","coli","hermannii","albertii","periodonticum","fergusonii")

## New tree placeholder
trimmed_tree <- tree.test

## Loop through each tip to drop
for(one_clade in clades) {
    ## Find the tips matching the species name
    species <- grep(one_clade, trimmed_tree$tip.label)
    ## Removing all the species but the first one
    trimmed_tree <- drop.tip(trimmed_tree, trimmed_tree$tip.label[species[-1]])
}

## Displaying the trimmed tree (with one OTU per species)
plot(trimmed_tree)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM