簡體   English   中英

蛋白質到 RNA 密碼子

[英]Protein to RNA codons

我有這個問題,我們需要編寫一個代碼,該代碼采用蛋白質 fasta 文件和蛋白質序列標識符,並計算 fasta 文件中序列的所有可能 RNA 組合,條件是組合總數應小於5000。

我開始制作一個 RNA 密碼子字典,然后我制作了一個 function 將 fasta 文件(氨基酸)的元素放入一個列表中,然后我嘗試從該列表中進行組合,但我得到一個錯誤,我嘗試但沒有不知道問題出在哪里,如果有人可以檢查代碼並告訴我出了什么問題,我將不勝感激。

import itertools

from Bio import SeqIO

RNA_codon_table = {
'A': ('GCU', 'GCC', 'GCA', 'GCG'),
'C': ('UGU', 'UGC'),
'D': ('GAU', 'GAC'),
'E': ('GAA', 'GAG'),
'F': ('UUU', 'UUC'),
'G': ('GGU', 'GGC', 'GGA', 'GGG'),
'H': ('CAU', 'CAC'),
'I': ('AUU', 'AUC', 'AUA'),
'K': ('AAA', 'AAG'),
'L': ('UUA', 'UUG', 'CUU', 'CUC', 'CUA', 'CUG'),
'M': ('AUG',),
'N': ('AAU', 'AAC'),
'P': ('CCU', 'CCC', 'CCA', 'CCG'),
'Q': ('CAA', 'CAG'),
'R': ('CGU', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'),
'S': ('UCU', 'UCC', 'UCA', 'UCG', 'AGU', 'AGC'),
'T': ('ACU', 'ACC', 'ACA', 'ACG'),
'V': ('GUU', 'GUC', 'GUA', 'GUG'),
'W': ('UGG',),
'Y': ('UAU', 'UAC'),}
 
def protein_fasta (protein_file):
   protein_sequence = []
   protein = SeqIO.parse(protein_file, format = 'fasta')
   for Seqrecord in protein: 
      protein_sequence.append(Seqrecord.seq)
   print (protein_sequence)


for seq in protein_sequence:
     codons = [ list(RNA_codon_table[key]) for key in protein_sequence ]
print(list(itertools.product(codons)))

對不起,我不知道如何附加 fasta 文件,但這是里面的序列:

>seq_compl complete sequence
IEEATHMTPCYELHGLRWVQIQDYAINVMQCL

這是我得到的錯誤:

 KeyError                                  Traceback (most recent call last)
<ipython-input-65-3dd46947c505> in <module>
----> 1 all_combinations ('short_protein.fasta')

<ipython-input-64-45a50fffc1d9> in all_combinations(protein_file)
      5        protein_sequence.append(Seqrecord.seq)
      6 
----> 7    codons = [ list(RNA_codon_table[key]) for key in protein_sequence 
]
      8    print(list(itertools.product(codons)))

<ipython-input-64-45a50fffc1d9> in <listcomp>(.0)
      5        protein_sequence.append(Seqrecord.seq)
      6 
----> 7    codons = [ list(RNA_codon_table[key]) for key in protein_sequence 
 ]
      8    print(list(itertools.product(codons)))

 KeyError: Seq('IEEATHMTPCYELHGLRWVQIQDYAINVMQCL')

根據您的示例,protein_sequence 變量目前在本地 scope 中聲明到 protein_fasta function。

您需要先將此 function 的結果分配給一個變量,然后才能對其進行迭代。

例如,將您的打印切換為退貨:

def protein_fasta (protein_file):
   protein_sequence = []
   protein = SeqIO.parse(protein_file, format = 'fasta')
   for Seqrecord in protein: 
      protein_sequence.append(Seqrecord.seq)
   return protein_sequence

並確保調用並分配 function 的結果:

protein_sequence = protein_fasta(protein_file)

現在你有了可以迭代的東西。

我可以看到您的 for 循環還有一個問題。 你沒有對seq做任何事情。 在這種情況下,大概 protein_sequence 應該換成 seq。 我還取出了包裝 RNA_codon_table 的列表,因為我認為在這種情況下不需要它:

for seq in protein_sequence:
    codons = [ RNA_codon_table[key] for key in seq ]
    print(list(itertools.product(*codons)))

您的蛋白質串將產生數十億種組合:

from itertools import product,islice
def protGen(proteins):
   for codons in product(*(RNA_codon_table[P] for P in proteins)):
       yield "".join(codons) 

計數組合:

proteins = "IEEATHMTPCYELHGLRWVQIQDYAINVMQCL"
    
count = 1
for P in proteins: count *= len(RNA_codon_table[P])

print(count) # 37,572,373,905,408 combinations

output:

for protSeq in islice(protGen(proteins),500): # first 500
    print(protSeq)

AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGUUUA
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGUUUG
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGUCUU
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGUCUC
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGUCUA
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGUCUG
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGCUUA
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGCUUG
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGCCUU
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGCCUC
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGCCUA
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAAUGCCUG
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAGUGUUUA
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAGUGUUUG
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAGUGUCUU
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAGUGUCUC
AUUGAAGAAGCUACUCAUAUGACUCCUUGUUAUGAAUUACAUGGUUUACGUUGGGUUCAAAUUCAAGAUUAUGCUAUUAAUGUUAUGCAGUGUCUA

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM