简体   繁体   中英

Collapse branches in a phylogenetic tree based on partially matching taxa labels in R

I have built a phylogenetic tree for a DNA bacterial region, where same bacterial species, in general, clustered together in close branches. Now, I would like to collapse branches, which have labels in common. I tried to define labels to collapse based on the following keywords which partially match names of terminal taxa:

keywords:

("vulneris","ulcerans","blattae","coli","hermannii","albertii","periodonticum","fergusonii")

In R, I upload the following file.newick:

(((((((((E_vulneris_otu44:0.03924,((E_vulneris_otu97:0.00766,
E_vulneris_otu96:0)0.8:0.00914,E_fergusonii_otu74:0.00725)0:0.0072)0:0,
((E_vulneris_otu95:0,
(((gi_undefined_HMPREF0402_04011_HMPREF0402_04011_E_ulcerans:0,
fig_768594rna24_RO08_01535_E_vulneris:0)0:0.00373,
(gi_undefined_HMPREF1766_00665_HMPREF1766_00665_E_vulneris:0,
fig_768595rna53_CBG60_05850_E_vulneris:0)0:0.00373)0.8:0.00701,
fig_7685910rna43_CI114_11510_E_vulneris:0)0.84:0.00717)0:0,
E_fergusonii_otu78:0.0072)0.85:0.00718)0:0,E_vulneris_otu94:0)0.82:0.00753,
E_vulneris_otu77:0)0.82:0.00698,(E_vulneris_otu93:0,((E_vulneris_otu89:0,
E_vulneris_otu90:0.00754)0:0.00765,E_vulneris_otu91:0)0.83:0.01608)0:0)
0.8:0.02319,(((E_vulneris_otu35:0,E_vulneris_otu34:0.00752)0.83:0.00766,
E_vulneris_otu28:0.00688)0:0,(E_vulneris_otu2:0.01715,E_vulneris_otu1:0)
0.89:0.01482)0.8:0.01541)0.89:0.02013,E_periodonticum_otu73:0)0.75:0.01535,
fig_86016rna55_CTM98_06410_E_periodonticum:0.00831)0.97:0.1808,
((((((E_blattae_otu76:0,E_blattae_otu75:0.01744)0.82:0.00698,
(E_blattae_otu4:0.00771,E_blattae_otu39:0)0.8:0.00762)0:0,
((gi_undefined_HMPREF1540_00319_HMPREF1540_00319_E_vulneris:0,
fig_8616rna58_DXA30_07775_E_ulcerans:0)0.81:0.00724,
gi_undefined_C4N16_02505_E_albertii:0)0.92:0.01676)0.78:0.01261,
E_blattae_otu92:0.004)0.78:0.02469,(((E_coli_otu8:0.01561,
E_coli_otu38:0.00378)0:0.00378,E_coli_otu33:0)0:0,
(((E_coli_otu54:0.00713,gi_undefined_C4N19_02700_E_coli:0)
0.73:0.00675,(((E_coli_otu57:0,E_coli_otu43:0.00715)0.84:0.00715,
E_coli_otu53:0)0.79:0.00852,((((E_coli_otu40:0,
E_coli_otu56:0.0076)0:0.00376,E_coli_otu55:0.00703)0:0.00376,
E_coli_otu37:0)0:0.0028,(E_coli_otu41:0,E_coli_otu4:0.00715)
0.9:0.00714)0:0.00395)0.79:0.00862)0.77:0.00764,E_coli_otu36:0)
0.82:0.00761)0.89:0.04396)0.83:0.0832,(gi_undefined_C4N18_07110_E_blattae:0,
gi_undefined_FUSO3_01390_E_hermannii:0.04598)0.92:0.1457)0.97:0.1015);
tree.test<-read.tree(file = "file.newick")

and build the tree by using ape and phytools packages:

ggtree(tree.test) + geom_tiplab()

but I cannot figure out how to collapse at the keyword level. Any suggestions would be very appreciated. Thank you!

One way to do that would be to drop all the OTUs but one in each species group using the ape::drop.tip function:

library(ape)

## List of clades
clades <- c("vulneris","ulcerans","blattae","coli","hermannii","albertii","periodonticum","fergusonii")

## New tree placeholder
trimmed_tree <- tree.test

## Loop through each tip to drop
for(one_clade in clades) {
    ## Find the tips matching the species name
    species <- grep(one_clade, trimmed_tree$tip.label)
    ## Removing all the species but the first one
    trimmed_tree <- drop.tip(trimmed_tree, trimmed_tree$tip.label[species[-1]])
}

## Displaying the trimmed tree (with one OTU per species)
plot(trimmed_tree)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM