[英]How do I find the nucleotide sequence of a protein using Biopython?
I have proteins for which I would like to find their corresponding nucleotide sequences.我有一些蛋白质,我想找到它们相应的核苷酸序列。 I also have the genome in which the protein is found.
我也有发现蛋白质的基因组。 In the genome, I have found the corresponding Gene ID for the protein.
在基因组中,我找到了蛋白质对应的基因 ID。 However, I am having trouble getting the nucleotide sequence with the Gene ID.
但是,我无法使用 Gene ID 获取核苷酸序列。 I have tried using Entrez Efetch:
我曾尝试使用 Entrez Efetch:
Entrez.email = "dddd@gmail.com"
with open("genome.gb", "w") as out_handle:
request = Entrez.efetch(db="gene", id="2703488", rettype="gb", retmode="text")
out_handle.write(request.read())
request.close()
but this only returns the following:但这只会返回以下内容:
1. G
tail component [Escherichia virus Lambda]
Other Aliases: lambdap14
Other Designations: tail component
Annotation: NC_001416.1 (9711..10133)
ID: 2703488
Is there anyway to get the actual nucleotide sequence using Efetch?有没有办法使用 Efetch 获得实际的核苷酸序列? Thanks in advance!
提前致谢!
You can obtain the sequence from NCBI nucleotide using the information in the Annotation:
line:您可以使用
Annotation:
行中的信息从 NCBI 核苷酸获取序列:
>>> from Bio import Entrez, SeqIO
>>> Entrez.email = ''
>>> request = Entrez.efetch(db="nuccore", id="NC_001416.1", rettype="fasta", seq_start="9711", seq_stop="10133")
>>> seq_record = SeqIO.read(request, "fasta")
>>> seq_record
SeqRecord(seq=Seq('ATGTTCCTGAAAACCGAATCATTTGAACATAACGGTGTGACCGTCACGCTTTCT...TGA', SingleLetterAlphabet()), id='NC_001416.1:9711-10133', name='NC_001416.1:9711-10133', description='NC_001416.1:9711-10133 Enterobacteria phage lambda, complete genome', dbxrefs=[])
>>> print(seq_record.seq)
ATGTTCCTGAAAACCGAATCATTTGAACATAACGGTGTGACCGTCACGCTTTCTGAACTGTCAGCCCTGCAGCGCATTGAGCATCTCGCCCTGATGAAACGGCAGGCAGAACAGGCGGAGTCAGACAGCAACCGGAAGTTTACTGTGGAAGACGCCATCAGAACCGGCGCGTTTCTGGTGGCGATGTCCCTGTGGCATAACCATCCGCAGAAGACGCAGATGCCGTCCATGAATGAAGCCGTTAAACAGATTGAGCAGGAAGTGCTTACCACCTGGCCCACGGAGGCAATTTCTCATGCTGAAAACGTGGTGTACCGGCTGTCTGGTATGTATGAGTTTGTGGTGAATAATGCCCCTGAACAGACAGAGGACGCCGGGCCCGCAGAGCCTGTTTCTGCGGGAAAGTGTTCGACGGTGAGCTGA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.