简体   繁体   English

如何使用 Biopython 找到蛋白质的核苷酸序列?

[英]How do I find the nucleotide sequence of a protein using Biopython?

I have proteins for which I would like to find their corresponding nucleotide sequences.我有一些蛋白质,我想找到它们相应的核苷酸序列。 I also have the genome in which the protein is found.我也有发现蛋白质的基因组。 In the genome, I have found the corresponding Gene ID for the protein.在基因组中,我找到了蛋白质对应的基因 ID。 However, I am having trouble getting the nucleotide sequence with the Gene ID.但是,我无法使用 Gene ID 获取核苷酸序列。 I have tried using Entrez Efetch:我曾尝试使用 Entrez Efetch:

Entrez.email = "dddd@gmail.com"
with open("genome.gb", "w") as out_handle:
    request = Entrez.efetch(db="gene", id="2703488", rettype="gb", retmode="text")
    out_handle.write(request.read())
    request.close()

but this only returns the following:但这只会返回以下内容:

1. G
tail component [Escherichia virus Lambda]
Other Aliases: lambdap14
Other Designations: tail component
Annotation:  NC_001416.1 (9711..10133)
ID: 2703488

Is there anyway to get the actual nucleotide sequence using Efetch?有没有办法使用 Efetch 获得实际的核苷酸序列? Thanks in advance!提前致谢!

You can obtain the sequence from NCBI nucleotide using the information in the Annotation: line:您可以使用Annotation:行中的信息从 NCBI 核苷酸获取序列:

>>> from Bio import Entrez, SeqIO
>>> Entrez.email = ''
>>> request = Entrez.efetch(db="nuccore", id="NC_001416.1", rettype="fasta", seq_start="9711", seq_stop="10133")
>>> seq_record = SeqIO.read(request, "fasta")
>>> seq_record
SeqRecord(seq=Seq('ATGTTCCTGAAAACCGAATCATTTGAACATAACGGTGTGACCGTCACGCTTTCT...TGA', SingleLetterAlphabet()), id='NC_001416.1:9711-10133', name='NC_001416.1:9711-10133', description='NC_001416.1:9711-10133 Enterobacteria phage lambda, complete genome', dbxrefs=[])
>>> print(seq_record.seq)
ATGTTCCTGAAAACCGAATCATTTGAACATAACGGTGTGACCGTCACGCTTTCTGAACTGTCAGCCCTGCAGCGCATTGAGCATCTCGCCCTGATGAAACGGCAGGCAGAACAGGCGGAGTCAGACAGCAACCGGAAGTTTACTGTGGAAGACGCCATCAGAACCGGCGCGTTTCTGGTGGCGATGTCCCTGTGGCATAACCATCCGCAGAAGACGCAGATGCCGTCCATGAATGAAGCCGTTAAACAGATTGAGCAGGAAGTGCTTACCACCTGGCCCACGGAGGCAATTTCTCATGCTGAAAACGTGGTGTACCGGCTGTCTGGTATGTATGAGTTTGTGGTGAATAATGCCCCTGAACAGACAGAGGACGCCGGGCCCGCAGAGCCTGTTTCTGCGGGAAAGTGTTCGACGGTGAGCTGA

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Biopython的翻译功能后,如何跟踪核苷酸序列中起始密码子(ATG)的位置? - How to track the position of a start codon (ATG) in a nucleotide sequence after using the translate function of Biopython? 如何在不使用 Biopython 的情况下找到 FASTA 数据集中的所有序列长度 - How do I find all Sequence Lengths in a FASTA Dataset without using the Biopython 使用“findall”查找蛋白质序列的序列基序 - Using "findall" to find a sequence motif for a protein sequence 如何使用 biopython 获得多序列 alignment 的共有序列? - How do I get a consensus sequence of a multiple sequence alignment using biopython? 如何检查序列是否是蛋白质序列? - How to I check if a sequence is a protein sequence or not? 如何使用biopython将基因库文件的序列编辑并保存到新的基因库文件中? - How do I edit AND SAVE the sequence of a genbank file to a NEW genbank file using biopython? Biopython,如何仅打印序列的文本? - Biopython, How do I print just the text of a sequence? 如何使用fasta文件而不是biopython中的蛋白质序列串创建多个序列比对 - How to Create multiple sequence alignments with fasta files rather then strings of protein sequences in biopython 如何在没有 BioPython 库的情况下将 RNA 翻译成蛋白质 - How to translate RNA to protein without BioPython library 使用 Biopython 查找和提取与精确 DNA 序列匹配的 FASTA - Using Biopython to find and extract FASTA matches to exact DNA sequence
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM