简体   繁体   English

RNA转PROTEIN程序的问题

[英]RNA to PROTEIN program questions

I have a few issues with my code, I'd appreciate some help. 我的代码有一些问题,不胜感激。

The first part of the program is meant to validate an input from the user; 该程序的第一部分旨在验证用户的输入。 so they cannot enter anything else but AUGCT (or lower case). 因此他们只能输入AUGCT(或小写字母)以外的任何内容。 However if I do enter anything else I get a very long error message but all I want is the program to restart the function validation check(). 但是,如果我输入其他任何内容,都会收到很长的错误消息,但我想要的只是该程序重新启动功能验证check()。

Also, if the user does enter a valid sequence, for some reason my code is not translating the valid RNA sequence to a protein sequence. 同样,如果用户确实输入了有效序列,则由于某种原因,我的代码未将有效RNA序列翻译为蛋白质序列。 I think maybe it has something to do with the chunks function that separates the str in input_rna into chunks of 3 letters. 我认为可能与chunks函数有关,该函数将input_rna中的str分成3个字母的块。

import re

input_rna = input("Type RNA sequence: ")

def chunks(l, n):
    for i in range(0, len(l), n):
        yield l[i:i+n]

def translate():
    amino_acids = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
        "UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
        "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
        "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
        "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
        "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
        "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
        "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
        "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
        "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
        "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
        "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
        "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
        "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
        "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
        "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}

    translated = "".join(amino_acids[i] for i in chunks("".join(input_rna), 3))

def validation_check():
    global input_rna
    if re.match(r"[A, U, G, C, T, a, u, g, c, t]", input_rna):
        print("Correct! That is a valid sequence.")
        translate()
    else:
        print("That is not a valid RNA sequence, please try again.")
        validation_check()
validation_check()

regular expression is wrong, try: 正则表达式错误,请尝试:

if re.match(r"^[AUGCT]+$", input_rna, re.IGNORECASE):

following is better, because in RNA the Uracil instead of Thyamine ... 以下效果更好,因为在RNA中用尿嘧啶代替硫胺素...

if re.match(r"^[AUGC]+$", input_rna, re.IGNORECASE):

note: algorithm translation has problem, also 注意:算法翻译也有问题

list(chunks("".join(input_rna), 3))

you get: 你得到:

['ACG', 'AUG', 'AGU', 'CAU', 'GCU', 'U']

problem in last by "ACGAUGAGUCAUGCUU", if length not is multple of 3 如果长度不是3的整数,则最后由“ ACGAUGAGUCAUGCUU”出现问题

solution: 解:

"".join(amino_acids[i] for i in chunks("".join(input_rna), 3) if len(i)==3)

In addition to other problems pointed out, your validation_check function does not allow the user to input the string again. 除了指出的其他问题之外,您的validation_check函数不允许用户再次输入字符串。 This means you'll keep trying to validate it over and over without ever changing it. 这意味着您将不断尝试对其进行验证,而无需对其进行更改。

What you probably want to do is something more like: 您可能想做的更像是:

def validation_check():
    input_rna = raw_input("Type RNA sequence: ").upper()
    if re.match(r"^[AUGCT]+$", input_rna):
        print("Correct! That is a valid sequence.")
        print translate(input_rna)
    else:
        print("That is not a valid RNA sequence, please try again.")
        validation_check()

This avoids using a global, allows the user to reinput, and doesn't automatically cause an infinite loop. 这样可以避免使用全局变量,允许用户重新输入,并且不会自动导致无限循环。

(Even so, using recursion here is probably bad, so you should think about implementing this as a while loop instead.) (即便如此,在这里使用递归可能很不好,因此您应该考虑将其实现为while循环。)

You'll notice a couple other things: 您还会注意到其他几件事:

  • raw_input instead of input , since the latter has an implicit eval . raw_input代替input ,因为后者具有隐式的eval You want to steer clear of that unless you absolutely need it. 除非绝对需要,否则您要避免这种情况。
  • .upper() so you have standardized strings to verify and key off of. .upper()因此您具有标准化的字符串来进行验证和注销。 Since your dictionary of bases is only using uppercase strings, this makes more sense than using re.I as recommended elsewhere. 由于您的基数字典仅使用大写字符串,因此比其他地方建议使用re.I更有意义。
  • I had translate return the translated protein, and then printed it. 我已经translate返回翻译的蛋白质,然后打印出来。 You may want to do something else. 您可能需要做其他事情。

I also added a default to your dictionary lookup: 我还为您的字典查找添加了默认设置:

translated = "".join(amino_acids.get(i, '!') for i in chunks("".join(rna), 3)) 

This way, you can try to keep processing if you get something weird, rather than having to deal with a KeyError (which will be raised if the user inputs a sequence you don't have a key for, like 'CUT' ) 这样,您可以尝试在出现奇怪的情况时继续处理,而不必处理KeyError (如果用户输入您没有键的序列,例如'CUT' ,则会引发'CUT'

I also noticed you allow, but don't translate, the base 'T' . 我还注意到您允许但不翻译基本的'T' You may want to look into that. 您可能需要调查一下。

Anyway, the complete code I wound up with is: 无论如何,我总结的完整代码是:

import re

def chunks(l, n): 
    for i in range(0, len(l), n): 
        # print i
        chunk = l[i:i+n]
        # print chunk
        yield l[i:i+n]

def translate(rna):
    amino_acids = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
        "UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
        "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
        "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
        "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
        "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
        "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
        "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
        "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
        "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
        "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
        "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
        "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
        "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
        "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
        "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}
    translated = "".join(amino_acids.get(i, '!') for i in chunks("".join(rna), 3)) 
    return translated

def validation_check():
    input_rna = raw_input("Type RNA sequence: ").upper()
    if re.match(r"^[AUGCT]+$", input_rna):
        print("Correct! That is a valid sequence.")
        print translate(input_rna)
    else:
        print("That is not a valid RNA sequence, please try again.")
        validation_check()

# in case you ever need to import this, don't always call validation_check
if __name__ == "__main__":
     validation_check()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM