简体   繁体   中英

Only iterating through first list in a list of lists

def translateStrand1(dnaStrand):
    protein = []
    proteinList = []
    start = dnaStrand.find('ATG')
    end = len(dnaStrand)
    totalLength = dnaStrand[start:end]

    remove = totalLength.split('TAG')
    for str in remove:
        split_str = [str[i:i+3] for i in range(0, len(str), 3)]
        protein.append(split_str)
        print(protein)
        for list in protein:
            for i in list:
                protein = (aminoAcid(i))
                proteinList.append(protein)
            return proteinList

Current result:

>>> translateStrand1('ABCATGTATGCCTAGATGCTGCGCTAGATGGTTGCA')
[['ATG', 'TAT', 'GCC']]
['Met', 'Tyr', 'Ala']

Required result for the given string:

>>> translateStrand1('ABCATGTATGCCTAGATGCTGCGCTAGATGGTTGCA')
[['Met, 'Tyr', 'Ala'], ['Met', 'Lev', 'Arg'], ['Met', 'Val', 'Ala']]

Looks like only the first list in protein is being iterated over instead of all lists. Also I only need the corresponding abbreviation and not the letters from the string in my output.

First, you should avoid using the variable names str and list , because these are special 'keywords' in python. It could cause you problems.

Second, your primary problem is where you are returning your results. You've placed the return function inside of your for loop, and that is why you aren't getting the full list of lists. This should be placed outside of your for loop like so:

def translateStrand1(dnaStrand):
    protein = []
    proteinList = []
    start = dnaStrand.find('ATG')
    end = len(dnaStrand)
    totalLength = dnaStrand[start:end]

    remove = totalLength.split('TAG')
    for sequence in remove:
        split_str = [sequence[i:i+3] for i in range(0, len(sequence), 3)]
        proteins_in_seq = []
        protein.append(split_str)
        for i in split_str:
            amino_acid = aminoAcid(i)
            proteins_in_seq.append(amino_acid)
        proteinList.append(proteins_in_seq)
    return proteinList

Indentation in python is very important, it is used to nest commands.

Lastly, you are getting both the base pairs and amino acid abbreviations because you are using a print statement print(protein) . If you don't want this, then you should remove this line.

PS You are also using the protein variable name for two different purposes. You might want to check that this is really your desired behavior.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM