简体   繁体   English

如何在没有 BioPython 库的情况下将 RNA 翻译成蛋白质

[英]How to translate RNA to protein without BioPython library

Is there any easy way to convert protein to RNA by using a dictionary and .replace function?有没有简单的方法通过使用字典和替换.replace将蛋白质转换为 RNA?

Also, I have no idea how to code all possible variants of the RNA and DNA code redundancy, which makes it possible to code one amino-acid, by the different RNA triplets.此外,我不知道如何编码 RNA 和 DNA 编码冗余的所有可能变体,这使得通过不同的 RNA 三联体编码一个氨基酸成为可能。

I think, maybe it should be like that:我想,也许它应该是这样的:

RNA = input("")
RNA_dictionary = {
  "GCA":"A", "GCC":"A", "GCG":"A", "GCU":"A",
  "UGC":"C", "UGU":"C", "GAC":"D", "GAU":"D",
  "GAA":"E", "GAG":"E", "UUC":"F", "UUU":"F",
  "GGA":"G", "GGC":"G", "GGG":"G", "GGU":"G",
  "CAC":"H", "CAU":"H", "AUA":"I", "AUC":"I",
  "AUU":"I", "AAA":"K", "AAG":"K", "UUA":"L",
  "UUG":"L", "CUA":"L", "CUC":"L", "CUG":"L",
  "CUU":"L", "AUG":"M", "AAC":"N", "AAU":"N",
  "CCA":"P", "CCC":"P", "CCG":"P", "CCU":"P",
  "CAA":"Q", "CAG":"Q", "AGA":"R", "AGG":"R",
  "CGA":"R", "CGC":"R", "CGU":"R", "CGG":"R",
  "AGC":"S", "AGU":"S", "UCA":"S", "UCC":"S",
  "UCG":"S", "UCU":"S", "ACA":"T", "ACC":"T",
  "ACG":"T", "ACU":"T", "GUA":"V", "GUC":"V",
  "GUG":"V", "GUU":"V", "UGG":"W", "UAC":"Y",
  "UAU":"Y", "UAG":"!", "UAA":"!", "UGA":"!"
}

reverse_translation = RNA_dictionary.replace #(Have no idea how to insert here the input RNA)
print (reverse_translation)

I know it all can be done by using just one function, from the BioPython.我知道这一切都可以通过使用来自 BioPython 的一个 function 来完成。 And maybe my way of learning bioinformatics and how to code in common is a bit weird.也许我学习生物信息学以及如何共同编码的方式有点奇怪。 But this way I like and feel that I more really understand how the code works than just memorizing it like a poem.但是这种方式我喜欢并且觉得我更真正地理解代码是如何工作的,而不是像一首诗一样记住它。

You can use the keys of your dictionary to create a regular expression for finding the triplets.您可以使用字典的键来创建一个正则表达式来查找三元组。 Then use the dictionary in the re.sub callback function to make the replacements:然后使用re.sub回调 function 中的字典进行替换:

regex = "|".join(RNA_dictionary.keys())
translation = re.sub(regex, lambda m: RNA_dictionary[m.group()], RNA)

print(translation)

Once you inverted your dict with the inv_dict = {v: k for k, v in ini_dict.items()} you mentioned you can use this.一旦你用你提到的inv_dict = {v: k for k, v in ini_dict.items()}反转你的dict ,你就可以使用它。

seq=""
for aa in prot:
    codon = inv_dict.get(aa)
    seq+=codon

print(seq)

It doesn't account for the redundancy, but keep in mind that even for a very small protein sequence, there are thousands of possible nucleotide sequences.它没有考虑冗余,但请记住,即使对于非常小的蛋白质序列,也有数千个可能的核苷酸序列。

For example, for the protein sequence:例如,对于蛋白质序列:

AlaProLysPheTrpIleCysAla AlaProLysPheTrpIleCysAla

you have 6 * 4 * 2 * 2 * 1 * 3 * 2 * 4 = 2304 possible sequences.你有 6 * 4 * 2 * 2 * 1 * 3 * 2 * 4 = 2304 个可能的序列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM