[英]Retrieve DNA sequence using a gene identifier of a protein
我使用Biopython尝试检索与我拥有GI(71743840)的蛋白质相对应的DNA序列,这在NCBI页面上非常简单,我只需要查找refseq。 我的问题来自使用ncbi fetch实用程序在python中进行编码时,我找不到一种方法来检索有助于我进入DNA的任何字段。
handle = Entrez.efetch(db="nucleotide", id=blast_record.alignments[0].hit_id, rettype="gb", retmode="text")
seq_record=SeqIO.read(handle,"gb")
seq_record.features中有很多信息,但是必须有一种更简单且明显的方式来执行此操作,我们将不胜感激。 日Thnx!
您可以尝试访问SeqRecord的注释:
seq_record=SeqIO.read(handle,"gb")
nucleotide_accession = seq_record.annotations["db_source"]
在您的情况下, nucleotide_accession
为“ REFSEQ:登录号NM_000673.4”
现在看看是否可以解析这些注释。 仅使用此测试用例:
nucl_id = nucleotide_accession.split()[-1]
handle = Entrez.efetch(db="nucleotide",
id=nucl_id,
rettype="gb",
retmode="text")
seq_record = SeqIO.read(handle, "gb")
您可以利用Entrez.elink,请求与核苷酸序列的UID相对应的蛋白质序列的UID:
from Bio import Entrez
from Bio import SeqIO
email = 'seb@free.fr'
term = 'NM_207618.2' #fro example, accession/version
### first step, we search for the nucleotide sequence of interest
h_search = Entrez.esearch(
db='nucleotide', email=email, term=term)
record = Entrez.read(h_search)
h_search.close()
### second step, we fetch the UID of that nt sequence
handle_nt = Entrez.efetch(
db='nucleotide', email=email,
id=record['IdList'][0], rettype='fasta') # here is the UID
### third and most important, we 'link' the UID of the nucleotide
# sequence to the corresponding protein from the appropriate database
results = Entrez.read(Entrez.elink(
dbfrom='nucleotide', linkname='nucleotide_protein',
email=email, id=record['IdList'][0]))
### last, we fetch the amino acid sequence
handle_aa = Entrez.efetch(
db='protein', email=email,
id=results[0]['LinkSetDb'][0]['Link'][0]['Id'], # here is the key...
rettype='fasta')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.