If statement with multiple “or” conditions?

Question

I am trying to write a program that loops through a string of RNA bases, finds the start codon ('AUG'), groups the following codons into threes (ie 'GAA', 'ACC'), looks up the corresponding amino acid from the dictionary, creates a string containing the resulting amino acids, and keeps going until it hits a stop codon ('UAA', 'UGA', 'UAG'). RNA gets read in groups of threes, starting from a start codon and ending at a stop codon.

The problem is when I want the program to check to see if it has hit one of the three stop codons, it does not work if I have all three listed in the same if statement. When checking the dictionary, it will treat the stop codon as an unknown ( .get(codon, 'X') ) and list it as an 'X' in the protein:

a_seq = 'AAAAUGGAAUGAACC'
kmer_size = 3
for start in range (0,len(a_seq)- kmer_size+1,1):
    kmer = a_seq[start:start+kmer_size]
    if kmer == 'AUG':
        start_codon = a_seq.index(kmer)
        new_seq = a_seq[start_codon:]
        last_codon_start = len(new_seq) - 2
        dictionary = {'AUG':'M',
                     'GAA':'E',
                     'ACC':'T'}
        protein = ''
        for start in range(0, last_codon_start, 3):
            codon = new_seq[start:start+3]
            print(codon)
            if codon != 'UAA' or codon != 'UGA' or codon != 'UAG':
                amino_acid = dictionary.get(codon,'X')
                protein += amino_acid
            else:
                break
        print(protein)
        break

Output:

AUG
GAA
UAA
ACC
MEXT

If I only list a single stop codon, then it works:

if codon != 'UAA':

AUG
GAA
UAA
ME

Both proteins should be 'ME'. I expect it to stop as soon as it hits any of the three stop codons. What is wrong with my if statement?

Answer 1

This corrects the one line.

if codon != 'UAA' and codon != 'UGA' and codon != 'UAG':

If you say not equal to x or not equal to y, it will always be true. Simplifying a bit

if x != 1 or x !=2:

No matter what x is, the statement will always be true. Every number is not equal to both 1 and 2, including 1 and 2.

But the clearest way to code this line is.

if codon not in ('UAA', 'UGA', 'UAG'):

One final thought is that you could add the stop codes to your dictionary and have them yield some value on which you trigger the break. This would address @Sam Mason's point about efficiency of hash lookups as well as saving some other steps in the main loop.

        dictionary = {'AUG': 'M',
                      'GAA': 'E',
                      'ACC': 'T',
                      'UUA': '*',
                      'UGA': '*',
                      'UAG': '*',
        }
        protein = ''
        for start in range(0, last_codon_start, 3):
            codon = new_seq[start:start+3]
            print(codon)
            amino_acid = dictionary.get(codon,'X')
            if amino_acid == '*':
                break
            protein += amino_acid

Final thought. The for loop could be simplified slightly by using the textwrap module (standard Python).

from textwrap import wrap
...
...
         for codon in wrap(new_seq, 3):
             print(codon)
             etc.

Answer 2

I think it would more readable to reverse the logic of the inner if that checks for stop codons with:

if codon == 'UAA' or codon == 'UGA' or codon == 'UAG':

However it would be more efficient to do the equivalent of that by storing the all possibilities in a set , which will make checking for membership both simpler and faster.

Here's what I mean (note that I also took the creation of the constants out of the loop):

START_CODONS = {'AUG': 'M',
                'GAA': 'E',
                'ACC': 'T'}
STOP_CODONS = {'UAA', 'UGA', 'UAG'}

a_seq = 'AAAAUGGAAUGAACC'
kmer_size = 3

for start in range (0, len(a_seq)-kmer_size+1, 1):
    kmer = a_seq[start: start+kmer_size]
    if kmer == 'AUG':
        start_codon = a_seq.index(kmer)
        new_seq = a_seq[start_codon:]
        last_codon_start = len(new_seq) - 2
        protein = ''
        for start in range(0, last_codon_start, 3):
            codon = new_seq[start: start+3]
            print(codon)
#            if codon == 'UAA' or codon == 'UGA' or codon == 'UAG':
            if codon in STOP_CODONS:
                break
            else:
                amino_acid = START_CODONS.get(codon, 'X')
                protein += amino_acid
        print('protein:', protein)
        break

Output:

AUG
GAA
UGA
protein: ME

If statement with multiple “or” conditions?

Question

2 answers

solution1
1 2019-07-24 23:00:25

solution2
1 2019-07-24 23:41:59

If statement with multiple “or” conditions?

Question

2 answers

solution1 1 2019-07-24 23:00:25

solution2 1 2019-07-24 23:41:59

solution1
1 2019-07-24 23:00:25

solution2
1 2019-07-24 23:41:59