简体   繁体   English

如何从出租车获得王国,门,类,秩序,家庭,属和物种的分类学特定ID?

[英]How to get taxonomic specific ids for kingdom, phylum, class, order, family, genus and species from taxid?

I have a list of taxids that looks like this: 我有一个如下所示的出租车列表:

1204725
2162
1300163
420247

I am looking to get a file with taxonomic ids in order from the taxids above: 我希望从上面的出租车中获取一个带有分类标准的文件:

kingdom_id      phylum_id       class_id        order_id        family_id       genus_id        species_id   

I am using the package " ete3 ". 我正在使用“ ete3 ”包。 I use the tool ete-ncbiquery that tells you the lineage from the ids above. 我使用工具ete-ncbiquery来告诉你上面的id的谱系。 (I run it from my linux laptop with the command below) (我使用下面的命令从我的linux笔记本电脑上运行它)

ete3 ncbiquery --search 1204725 2162 13000163 420247 --info 

The result looks like this: 结果如下:

# Taxid Sci.Name    Rank    Named Lineage   Taxid Lineage
2162    Methanobacterium formicicum species root,cellular organisms,Archaea,Euryarchaeota,Methanobacteria,Methanobacteriales,Methanobacteriaceae,Methanobacterium,Methanobacterium formicicum   1,131567,2157,28890,183925,2158,2159,2160,2162
1204725 Methanobacterium formicicum DSM 3637    no rank root,cellular organisms,Archaea,Euryarchaeota,Methanobacteria,Methanobacteriales,Methanobacteriaceae,Methanobacterium,Methanobacterium formicicum,Methanobacterium formicicum DSM 3637  1,131567,2157,28890,183925,2158,2159,2160,2162,1204725
420247  Methanobrevibacter smithii ATCC 35061   no rank root,cellular organisms,Archaea,Euryarchaeota,Methanobacteria,Methanobacteriales,Methanobacteriaceae,Methanobrevibacter,Methanobrevibacter smithii,Methanobrevibacter smithii ATCC 350611,131567,2157,28890,183925,2158,2159,2172,2173,420247

I have no idea which items (IDS) correspond to what I am looking for (if any) 我不知道哪些项目(IDS)对应于我要找的东西(如果有的话)

The following code: 以下代码:

import csv
from ete3 import NCBITaxa

ncbi = NCBITaxa()

def get_desired_ranks(taxid, desired_ranks):
    lineage = ncbi.get_lineage(taxid)
    lineage2ranks = ncbi.get_rank(lineage)
    ranks2lineage = dict((rank, taxid) for (taxid, rank) in lineage2ranks.items())
    return {'{}_id'.format(rank): ranks2lineage.get(rank, '<not present>') for rank in desired_ranks}

def main(taxids, desired_ranks, path):
    with open(path, 'w') as csvfile:
        fieldnames = ['{}_id'.format(rank) for rank in desired_ranks]
        writer = csv.DictWriter(csvfile, delimiter='\t', fieldnames=fieldnames)
        writer.writeheader()
        for taxid in taxids:
            writer.writerow(get_desired_ranks(taxid, desired_ranks))

if __name__ == '__main__':
    taxids = [1204725, 2162,  1300163, 420247]
    desired_ranks = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']
    path = 'taxids.csv'
    main(taxids, desired_ranks, path)

Produces a file that looks like this: 生成一个如下所示的文件:

kingdom_id  phylum_id   class_id    order_id    family_id   genus_id    species_id
<not present>   28890   183925  2158    2159    2160    2162
<not present>   28890   183925  2158    2159    2160    2162
<not present>   28890   183925  2158    2159    2160    2162
<not present>   28890   183925  2158    2159    2172    2173

With the Taxid Lineage numbers in your results, try using them in ete3's get_rank method. 使用结果中的Taxid Lineage数字,尝试在ete3的get_rank方法中使用它们。 As an example : 举个例子

from ete3 import NCBITaxa
ncbi = NCBITaxa()

print ncbi.get_rank([9606, 9443])
# {9443: u'order', 9606: u'species'}

Presumably the resulting dictionary should contain the rank information of all IDs, including any intermediate "no rank" IDs that you may want to eliminate. 据推测,结果字典应包含所有ID的排名信息,包括您可能想要消除的任何中间“无排名”ID。

You can also use the R packaage taxonomizr . 您也可以使用R packaage taxonomizr The package takes a bit of time to download the necessary files, but after that its quite fast and easy. 该软件包需要一些时间来下载必要的文件,但之后它非常快速和简单。

library("taxonomizr)
getNamesAndNodes()
taxaNodes <- read.nodes('nodes.dmp')
taxaNames <- read.names('names.dmp')
taxaID <- c("1204725", "2162", "1300163", "420247")

getNamesAndNodes downloads the names.dmp and nodes.dmp file from ncbi. getNamesAndNodes从ncbi下载names.dmpnodes.dmp文件。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 是否有任何 R 函数可以从物种分类 ID/物种名称或属名中提取所有分类名称(门、类、目、科...)? - Is there any R function to extract all taxonomy name (phylum, class, order, family ...) from species taxonomic ID/ species name or genus name? 如何从taxid获得分类等级名称? - How can I get taxonomic rank names from taxid? ete3:如何从分类法ID中获取分类法等级名称? - ete3: How to get taxonomic rank names from taxonomy id? 如何查询uniprot.org以获取给定物种的所有Uniprot ID? - How to query uniprot.org to get all Uniprot IDs for a given species? 尝试从Biopython获取分类信息 - Attempting to Obtain Taxonomic Information from Biopython 使用biopython将物种转换为ID,仅将我的某些物种转换为NCBI ID - Only some of my species are being converted to NCBI IDs, using biopython to convert species to ID R - 如何将 dataframe 的列拆分为单独的数据帧,每个数据帧都包含来自原始 dataframe 的“物种”列? - R - How to split columns of a dataframe into individual dataframes that each contain a “species” column from the original dataframe? 以相同物种名称连接两个文件中的DNA序列 - Joining DNA sequences from two files under the same species name 无法使用 biomaRt 包从 Entrez ID 获取基因符号 - Unable to use biomaRt package to get Gene Symbols from Entrez IDs 根据另一个文件中提供的ID从SDF文件中顺序提取分子 - Extract molecules in order from SDF file according to IDs given in another file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM