简体   繁体   中英

Changing phylogenetic tree tip labels in R (read.tree) for all tips (e.g adding in a ' ' or '_')

I'm reading in a phylogenetic tree into R

library(ape)
library(geiger)
library(caper)

taxatree <- read.tree("newicktest.tre")
LWEVIYRcombodata <- read.csv("LWEVIYR.csv")

LWEVIYRcombodataPGLS <-data.frame(LWEVIYRcombodata$Sum.of.percentage,OGT=LWEVIYRcombodata$OGT, Species=LWEVIYRcombodata$Species)

comp.dat <- comparative.data(taxatree, LWEVIYRcombodataPGLS, "Species")

However I get the following error message:

Error in comparative.data(taxatree, LWEVIYRcombodataPGLS, "Species") : 
  No tips are common to the dataset and phylogeny

My tree is this:

(('Nanoarchaeum equitans':4.0,('Aeropyrum pernix':4.0,'Pyrobaculum aerophilum':4.0,('Sulfolobus tokodaii':4.0,'Sulfolobus solfataricus':4.0,'Sulfolobus acidocaldarius':4.0):4.0):4.0,('Methanopyrus kandleri':4.0,('Methanothermobacter thermautotrophicus':4.0,'Methanosphaera stadtmanae':4.0):4.0,('Picrophilus torridus':4.0,('Thermoplasma volcanium':4.0,'Thermoplasma acidophilum':4.0):4.0):4.0,('Thermococcus kodakarensis':4.0,('Pyrococcus horikoshii':4.0,'Pyrococcus abyssi':4.0,'Pyrococcus furiosus':4.0):4.0):4.0,('Natronomonas pharaonis':4.0,'Haloarcula marismortui':4.0):4.0,'Archaeoglobus fulgidus':4.0,(('Methanococcoides burtonii':4.0,('Methanosarcina acetivorans':4.0,'Methanosarcina mazei':4.0,'Methanosarcina barkeri':4.0):4.0):4.0,'Methanospirillum hungatei':4.0):4.0,('Methanococcus maripaludis':4.0,'Methanocaldococcus jannaschii':4.0):4.0):4.0):4.0,('Candidatus Koribacter versatilis Ellin345':4.0,'Fusobacterium nucleatum subsp. nucleatum ATCC 25586':4.0,'Aquifex aeolicus':4.0,('Trichormus variabilis':4.0,('Thermosynechococcus elongatus':4.0,'Synechococcus elongatus':4.0):4.0):4.0,'Thermotoga maritima':4.0,('Mesoplasma florum':4.0,('Ureaplasma urealyticum':4.0,('Mycoplasma penetrans':4.0,'Mycoplasma mobile':4.0,'Mycoplasma synoviae':4.0,'Mycoplasma pulmonis':4.0,'Mycoplasma pneumoniae':4.0,'Mycoplasma hyopneumoniae':4.0,'Mycoplasma genitalium':4.0,'Mycoplasma gallisepticum':4.0):4.0):4.0):4.0,('Bifidobacterium longum':4.0,'Thermobifida fusca':4.0,('Streptomyces avermitilis':4.0,'Streptomyces coelicolor':4.0):4.0,'Cutibacterium acnes':4.0,('Nocardia farcinica':4.0,('Mycobacterium tuberculosis':4.0,'Mycobacterium leprae':4.0,'Mycobacterium bovis':4.0,'Mycobacterium avium':4.0):4.0,('Corynebacterium efficiens':4.0,'Corynebacterium jeikeium':4.0,'Corynebacterium glutamicum':4.0,'Corynebacterium diphtheriae':4.0):4.0):4.0,'Leifsonia xyli':4.0):4.0,((('Caldanaerobacter subterraneus':4.0,'Carboxydothermus hydrogenoformans':4.0,'Moorella thermoacetica':4.0):4.0,('Desulfitobacterium hafniense':4.0,'Symbiobacterium thermophilum':4.0,('Clostridium tetani':4.0,'Clostridium perfringens':4.0,'Clostridium acetobutylicum':4.0):4.0):4.0):4.0,((('Lactobacillus johnsonii':4.0,'Lactobacillus salivarius':4.0,'Lactobacillus sakei':4.0,'Lactobacillus plantarum':4.0,'Lactobacillus acidophilus':4.0):4.0,'Enterococcus faecalis':4.0,('Lactococcus lactis subsp. lactis':4.0,('Streptococcus pyogenes':4.0,'Streptococcus pneumoniae':4.0,'Streptococcus agalactiae':4.0,'Streptococcus mutans':4.0,'Streptococcus thermophilus':4.0):4.0):4.0):4.0,(('Listeria innocua':4.0,'Listeria monocytogenes':4.0):4.0,('Oceanobacillus iheyensis':4.0,'Geobacillus kaustophilus':4.0,('[Bacillus thuringiensis] serovar konkukian':4.0,'Bacillus halodurans':4.0,'Bacillus clausii':4.0,'Bacillus subtilis':4.0,'Bacillus licheniformis':4.0,'Bacillus cereus':4.0,'Bacillus anthracis':4.0):4.0):4.0,('Staphylococcus aureus subsp. aureus MRSA252':4.0,'Staphylococcus saprophyticus':4.0,'Staphylococcus haemolyticus':4.0,'Staphylococcus epidermidis':4.0):4.0):4.0):4.0):4.0,('Chlorobaculum tepidum TLS':4.0,'Pelodictyon luteolum':4.0):4.0,('Salinibacter ruber':4.0,('Porphyromonas gingivalis':4.0,('Bacteroides thetaiotaomicron':4.0,'Bacteroides fragilis':4.0):4.0):4.0):4.0,('Chlamydia pneumoniae AR39':4.0,'Chlamydia muridarum':4.0,'Chlamydia trachomatis':4.0):4.0,(('Deinococcus geothermalis':4.0,'Deinococcus radiodurans':4.0):4.0,'Thermus thermophilus':4.0):4.0,('Leptospira interrogans':4.0,(('Borreliella bavariensis':4.0,'Borreliella burgdorferi B31':4.0):4.0,('Treponema pallidum':4.0,'Treponema denticola':4.0):4.0):4.0):4.0,('Bdellovibrio bacteriovorus':4.0,('Caulobacter vibrioides CB15':4.0,('Ruegeria pomeroyi':4.0,'Rhodobacter sphaeroides':4.0):4.0,('Rickettsia typhi':4.0,'Rickettsia prowazekii':4.0,'Rickettsia conorii':4.0):4.0,('Erythrobacter litoralis':4.0,('Novosphingobium aromaticivorans':4.0,'Zymomonas mobilis':4.0):4.0):4.0,(('Magnetospirillum magneticum':4.0,'Rhodospirillum rubrum':4.0):4.0,'Gluconobacter oxydans':4.0):4.0,('Brucella melitensis':4.0,('Bartonella henselae':4.0,'Bartonella quintana':4.0):4.0,'Mesorhizobium loti':4.0,('Rhodopseudomonas palustris':4.0,('Nitrobacter winogradskyi':4.0,'Nitrobacter hamburgensis':4.0):4.0,'Bradyrhizobium japonicum':4.0):4.0,('Rhizobium etli':4.0,'Sinorhizobium meliloti':4.0,'Agrobacterium tumefaciens':4.0):4.0):4.0):4.0,(('Chromobacterium violaceum':4.0,('Neisseria meningitidis':4.0,'Neisseria gonorrhoeae':4.0):4.0):4.0,('Thiobacillus denitrificans':4.0,('Nitrosospira multiformis':4.0,'Nitrosomonas europaea':4.0):4.0,'Methylobacillus flagellatus':4.0):4.0,('Rhodoferax ferrireducens':4.0,('Bordetella pertussis':4.0,'Bordetella parapertussis':4.0,'Bordetella bronchiseptica':4.0):4.0,('Cupriavidus metallidurans':4.0,'Burkholderia thailandensis':4.0,'Ralstonia solanacearum':4.0):4.0):4.0):4.0,(('Hahella chejuensis':4.0,'Chromohalobacter salexigens':4.0):4.0,'Legionella pneumophila subsp. pneumophila':4.0,'Saccharophagus degradans':4.0,('Francisella tularensis subsp. tularensis SCHU S4':4.0,'Hydrogenovibrio crunogenus':4.0):4.0,'Nitrosococcus oceani':4.0,('Pasteurella multocida':4.0,('[Haemophilus] ducreyi 35000HP':4.0,'Haemophilus influenzae':4.0):4.0):4.0,('Aliivibrio fischeri ES114':4.0,'Photobacterium profundum':4.0,('Vibrio vulnificus':4.0,'Vibrio parahaemolyticus':4.0,'Vibrio cholerae':4.0):4.0):4.0,('Photorhabdus luminescens':4.0,('Sodalis glossinidius':4.0,'Pectobacterium atrosepticum':4.0):4.0,('Yersinia pseudotuberculosis':4.0,'Yersinia pestis':4.0):4.0,('Salmonella enterica subsp. enterica serovar Typhi str. CT18':4.0,('Shigella sonnei':4.0,'Shigella flexneri':4.0,'Shigella dysenteriae':4.0,'Shigella boydii':4.0):4.0,'Escherichia coli':4.0):4.0):4.0,'Methylococcus capsulatus':4.0,('Xylella fastidiosa':4.0,('Xanthomonas oryzae':4.0,'Xanthomonas citri':4.0,'Xanthomonas campestris':4.0):4.0):4.0,('Psychrobacter arcticus':4.0,('Pseudomonas protegens':4.0,'Pseudomonas savastanoi':4.0,'Pseudomonas putida':4.0,'Pseudomonas aeruginosa':4.0):4.0):4.0,('Idiomarina loihiensis':4.0,('Shewanella denitrificans':4.0,'Shewanella oneidensis':4.0):4.0,'Colwellia psychrerythraea':4.0,'Pseudoalteromonas haloplanktis':4.0):4.0):4.0,(('Sulfurimonas denitrificans':4.0,'Wolinella succinogenes':4.0,('Helicobacter hepaticus':4.0,'Helicobacter pylori':4.0):4.0):4.0,'Campylobacter jejuni':4.0):4.0,('Anaeromyxobacter dehalogenans':4.0,'Desulfotalea psychrophila':4.0,('Desulfovibrio alaskensis':4.0,'Desulfovibrio vulgaris':4.0):4.0,(('Geobacter sulfurreducens':4.0,'Geobacter metallireducens':4.0):4.0,'Pelobacter carbinolicus':4.0):4.0):4.0):4.0):4.0);

A subset of my input data looks like this:

+-------------------------------+-----+-------------------+
|            Species            | OGT | Sum of percentage |
+-------------------------------+-----+-------------------+
| Aeropyrum pernix              |  95 |     46.3165467333 |
| Argobacterium fabrum          |  26 |     39.0114463099 |
| Anaeromyxobacter dehalogenans |  27 |     40.7932155627 |
| Aquifex aeolicus              |  85 |     45.4972652338 |
| Archaeoglobus fulgidus        |  83 |     44.7570927331 |
| Bacillus anthracis            |  30 |     40.9076162356 |
| Bacillus cereus               |  30 |     40.8716699079 |
| Bacillus clausii              |  30 |     40.3212556402 |
+-------------------------------+-----+-------------------+

I know some of the labels may be slightly wrong, but that shouldn't tell me that none of them are.

Interestingly when read into R, the phylogenetic tree's tip labels do not have any spaces:

'Anaeromyxobacterdehalogenans''Aeropyrumpernix'

Although you can see above that the tree file has spaces in it. I'd rather not edit all the csv files (I have about a million of them), is there a way I can either edit the tree tip labels once the tree has been read into R, or another solution to this issue?

Thanks,

正如评论中提到的patL,答案是要使用gsub,

LWEVIYRcombodataPGLS$Species<-gsub(" ", "", LWEVIYRcombodataPGLS$Species)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM