简体   繁体   中英

How to use a dictionary to modify a strings with python?

I have a string and a dictionary. I must replace the parts of the string with corresponding values in the dictionary (using the dictionary keys).

given string: `

rna = AUGCAUGUACCGAAUGCUGAGGGGGCUUCCUAA

given dictionary:

amino_acids = {"UUU" : "Phe", "UUC" : "Phe", "UUA" : "Leu", "UUG" : "Leu",
                "CUU" : "Leu", "CUC" : "Leu", "CUA" : "Leu", "CUG" : "Leu",
                "AUU" : "Ile", "AUC" : "Ile", "AUA" : "Ile", "AUG" : "Met",
                "GUU" : "Val", "GUC" : "Val", "GUA" : "Val", "GUG" : "Val",
                "UCU" : "Ser", "UCC" : "Ser", "UCA" : "Ser", "UCG" : "Ser",
                "CCU" : "Pro", "CCC" : "Pro", "CCA" : "Pro", "CCG" : "Pro",
                "ACU" : "Thr", "ACC" : "Thr", "ACA" : "Thr", "ACG" : "Thr",
                "GCU" : "Ala", "GCC" : "Ala", "GCA" : "Ala", "GCG" : "Ala",
                "UAU" : "Tyr", "UAC" : "Tyr", "UAA" : "STOP", "UAG" : "STOP",
                "CAU" : "His", "CAC" : "His", "CAA" : "Gln", "CAG" : "Gln",
                "AAU" : "Asn", "AAC" : "Asn", "AAA" : "Lys", "AAG" : "Lys",
                "GAU" : "Asp", "GAC" : "Asp", "GAA" : "Glu", "GAG" : "Glu",
                "UGU" : "Cys", "UGC" : "Cys", "UGA" : "STOP", "UGG" : "Trp",
                "CGU" : "Arg", "CGC" : "Arg", "CGA" : "Arg", "CGG" : "Arg",
                "AGU" : "Ser", "AGC" : "Ser", "AGA" : "Arg", "AGG" : "Arg",
                "GGU" : "Gly", "GGC" : "Gly", "GGA" : "Gly", "GGG" : "Gly"
                  }

Expected output:

Met-His-Val-Pro-Asn-Ala-Glu-Gly-Ala-Ser-*

Question: What am I doing wrong? How can this be done without modules? Thanks!

EDIT:

Solution

def rna_to_protein(rna):

    acids = [rna[i:i+3] for i in range(0, len(rna), 3)]

    protein ="-".join(amino_acids[acid] for acid in acids)
    protein = protein.replace("STOP", "*")

    return protein

You can split the rna string into individual strings of length 3:

>>> rna = "AUGCAUGUACCGAAUGCUGAGGGGGCUUCCUAA"
>>> acids = [rna[i:i+3] for i in range(0, len(rna), 3)]
>>> acids
['AUG', 'CAU', 'GUA', 'CCG', 'AAU', 'GCU', 'GAG', 'GGG', 'GCU', 'UCC', 'UAA']

Then you can use them to look up the acids in the dictionary:

>>> "-".join(amino_acids[acid] for acid in acids)
'Met-His-Val-Pro-Asn-Ala-Glu-Gly-Ala-Ser-STOP'

Keeping it simple...

for key in amino_acids:
    rna = rna.replace(key, amino_acids[key])

The simplest way would be to take every 3 characters, find the corresponding value from the dictionary and append it to the result string.

(However, this is assuming that the RNA will always be complete)

def rna_to_proteins(rna,amino_acids):

    result = ""
    i=0

    # Traverse through every 3 characters and match with dictionary values
    while i < len(rna):
        # Get 3 characters
        sequence = rna[i:i+3]
        print(sequence)

        if amino_acids[sequence] is not "STOP":
            result+=amino_acids[sequence]
            result+="-"
        else:
            result+="*"
        
        i+=3

    # Finally print the result string
    print(result)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM