Never-ending loop? Can't get python to stop running

Question

When I try to run this code, it never finishes and I think it's stuck somewhere but I'm not too sure since I am new to python.

import re
codon = []
rcodon = []

dataset = "ggtcagaaaaagccctctccatgtctactcacgatacatccctgaaaaccactgaggaagtggcttttcagatcatcttgctttgccagtttggggttgggacttttgccaatgtatttctctttgtctataatttctctccaatctcgactggttctaaacagaggcccagacaagtgattttaagacacatggctgtggccaatgccttaactctcttcctcactatatttccaaacaacatgatga"
startcodon=0
n=0
print ("DNA sequence: ", dataset)

def find_codon(codon, string, start):
    i = start + 3
    while i < len(string):
        i = string.find(codon, i) # find the next substring
        if (i - start) % 3 == 0:  # check that it's a multiple of 3 after start
            return i
    return None

while(n < 1):
    startcodon=dataset.find("atg", startcodon)
    #locate stop codons
    taacodon=find_codon("taa", dataset, startcodon)
    tagcodon=find_codon("tag", dataset, startcodon)
    tgacodon=find_codon("tga", dataset, startcodon)

    stopcodon = min(taacodon, tagcodon, tgacodon)
    codon.append(dataset[startcodon:stopcodon+3])
    if(startcodon > len(dataset) or startcodon < 0):
        n = 2;
    startcodon=stopcodon
#reverse the string and swap the letters
n=0;
while(n < len(codon)):
        rcodon.append (codon[n][len(codon[n])::-1])
        #replace a with u
        rcodon[n] = re.sub('a', "u", rcodon[n])
        #replace t with a
        rcodon[n] = re.sub('t', "a", rcodon[n])
        #replace c with x
        rcodon[n] = re.sub('c', "x", rcodon[n])
        #replace g with c
        rcodon[n] = re.sub('g', "c", rcodon[n])
        #replace x with g
        rcodon[n] = re.sub('x', "g", rcodon[n])
        print("DNA sequence: ", codon[n] ,'\n', "RNA sequence:", rcodon[n])
        n=n+1
answer = 0
print("Total Sequences:  ", len(codon)-3)
while (int(answer) >=0):
        #str = "Please enter an integer from 0 to " + str(len(dataset)) + " or -1 to quit: "
        answer = int(input("Please enter a sequence you would like to see or -1 to quit:  "))
        if(int(answer) >= 0):
                print("DNA sequence: ", codon[int(answer)] ,'\n', "RNA sequence:", rcodon[int(answer)])

Any advice would be helpful.

This is a project about transcribing DNA WITHOUT biopython The goal: create a program that can locate the 'atg' in a DNA sequence and then find the stop sequence (tga, taa, or tag) while counting in threes from the initial atg.

edit: I want the program to give me the sequences between atg and a stop codon like my original code. My original code, however, did not consider moving by 3's from the atg to find the correct stop sequence.

my original code:

import re
codon = []
rcodon = []


dataset = "ggtcagaaaaagccctctccatgtctactcacgatacatccctgaaaaccactgaggaagtggcttttcagatcatcttgctttgccagtttggggttgggacttttgccaatgtatttctctttgtctataatttctctccaatctcgactggttctaaacagaggcccagacaagtgattttaagacacatggctgtggccaatgccttaactctcttcctcactatatttccaaacaacatgatga"
startcodon=0
n=0
while(n < 1):
    startcodon=dataset.find("atg", startcodon, len(dataset)-startcodon)
    #locate stop codons
    taacodon=dataset.find("taa", startcodon+3, len(dataset)-startcodon)
    tagcodon=dataset.find("tag", startcodon+3, len(dataset)-startcodon)
    tgacodon=dataset.find("tga", startcodon+3, len(dataset)-startcodon)
    if(taacodon<tagcodon):
        if(taacodon<tgacodon):
            stopcodon=taacodon
            #print("taacodon", startcodon)
        else:
            stopcodon=tgacodon
            #print("tGacodon", startcodon)

    elif(tgacodon>tagcodon):
        stopcodon=tagcodon
        #print("taGcodon", startcodon)
    else:
        stopcodon=tgacodon
        #print("tGacodon", startcodon)
    #to add sequences to an array
    codon.append(dataset[startcodon:stopcodon+3])
    if(startcodon > len(dataset) or startcodon < 0):
        n = 2;
    startcodon=stopcodon

#reverse the string and swap the letters
n=0;
while(n < len(codon)):
        rcodon.append (codon[n][len(codon[n])::-1])
        #replace a with u
        rcodon[n] = re.sub('a', "u", rcodon[n])
        #replace t with a
        rcodon[n] = re.sub('t', "a", rcodon[n])
        #replace c with x
        rcodon[n] = re.sub('c', "x", rcodon[n])
        #replace g with c
        rcodon[n] = re.sub('g', "c", rcodon[n])
        #replace x with g
        rcodon[n] = re.sub('x', "g", rcodon[n])
        print("DNA sequence: ", codon[n] ,'\n', "RNA sequence:", rcodon[n])
        n=n+1
answer = 0
print("Total Sequences:  ", len(codon)-3)
while (int(answer) >= 0):
        #str = "Please enter an integer from 0 to " + str(len(dataset)) + " or -1 to quit: "
        answer = int(input("Please enter an sequence you would like to see or -1 to quit:  "))
        if(int(answer) >= 0):
                print("DNA sequence: ", codon[int(answer)] ,'\n', "RNA sequence:", rcodon[int(answer)])

Answer 1

The problem your facing regarding the endless loop is due to your function notice that once you find a possible i and its not a multiple of 3 you should add 3 to it otherwise the i = string.find(codon, i) will return the same i value, the correction should be:

def find_codon(codon, string, start):
    i = start + 3
    while i < len(string):
        i = string.find(codon, i) # find the next substring
        if (i - start) % 3 == 0:  # check that it's a multiple of 3 after start
            return i
        else:
            i += 3
    return None

You will then have a problem with the use of min with None value and get the following error:

stopcodon = min(taacodon, tagcodon, tgacodon) TypeError: '<' not supported between instances of 'NoneType' and 'int'

you should set the return value to some large number that will indicate that nothing was found rather than None

Answer 2

There are multiple problems with the above code. I'm going to use the original since that was post edit (so I assume its the most recent).

dataset = "ggtcagaaaaagccctctccatgtctactcacgatacatccctgaaaaccactgaggaagtggcttttcagatcatcttgctttgccagtttggggttgggacttttgccaatgtatttctctttgtctataatttctctccaatctcgactggttctaaacagaggcccagacaagtgattttaagacacatggctgtggccaatgccttaactctcttcctcactatatttccaaacaacatgatga"
startcodon=0
n=0
while(n < 1):
    startcodon=dataset.find("atg", startcodon, len(dataset)-startcodon)
    #locate stop codons
    taacodon=dataset.find("taa", startcodon+3, len(dataset)-startcodon)
    tagcodon=dataset.find("tag", startcodon+3, len(dataset)-startcodon)
    tgacodon=dataset.find("tga", startcodon+3, len(dataset)-startcodon)

This is not jumping by groups of 3. This is going through the string and locating its position. Which is why you will always get the same value no matter what.

if(taacodon<tagcodon):
        if(taacodon<tgacodon):
            stopcodon=taacodon
            #print("taacodon", startcodon)
        else:
            stopcodon=tgacodon
            #print("tGacodon", startcodon)

    elif(tgacodon>tagcodon):
        stopcodon=tagcodon
        #print("taGcodon", startcodon)
    else:
        stopcodon=tgacodon
        #print("tGacodon", startcodon)

I presume this is aimed at finding the first stop codon. However, find returns a value of -1 if it can't locate the string (and since you don't have tag, that will always be the stop codon, even though it doesn't exist).

n=0;
while(n < len(codon)):
        rcodon.append (codon[n][len(codon[n])::-1])
        #replace a with u
        rcodon[n] = re.sub('a', "u", rcodon[n])
        #replace t with a
        rcodon[n] = re.sub('t', "a", rcodon[n])
        #replace c with x
        rcodon[n] = re.sub('c', "x", rcodon[n])
        #replace g with c
        rcodon[n] = re.sub('g', "c", rcodon[n])
        #replace x with g
        rcodon[n] = re.sub('x', "g", rcodon[n])
        print("DNA sequence: ", codon[n] ,'\n', "RNA sequence:", rcodon[n])
        n=n+1

Use dicts and fstrings, cleans things substantially more. I also don't quite understand why you have c to x, and then x to g.

Finally, your dataset does not contain a stop codon from the first atg. So it cannot be transcribed the way you want.

I have added in a stop codon at the end of your dataset, to get the output you wish you can do this:

dataset = "ggtcagaaaaagccctctccatgtctactcacgatacatccctgaaaaccactgaggaagtggcttttcagatcatcttgctttgccagtttggggttgggacttttgccaatgtatttctctttgtctataatttctctccaatctcgactggttctaaacagaggcccagacaagtgattttaagacacatggctgtggccaatgccttaactctcttcctcactatatttccaaacaacatgtaaa"

rdict={'a':'u','t':'a','c':'g','g':'c'}
start_codon=dataset.find("atg")
for nucleotides in range(start_codon+3,len(dataset),3):
    if dataset[nucleotides:nucleotides+3] in {'taa','tag','tga'}:
        stop_codon=nucleotides
DNA=[]
RNA=[]
for bases in range(start_codon,stop_codon,1):
    DNA.append(dataset[bases])
    RNA.append(rdict[dataset[bases]])

print(f"DNA Sequence: {''.join(DNA)}\nRNA Sequence: {''.join(RNA)}")

while True:
    answer=input('\nplease input sequence you would like to see or exit to quit:  ')
    if answer == 'exit':
        break
    try:
        print(f'DNA Sequence: {DNA[int(answer)]}\nRNA Sequence: {RNA[int(answer)]}')
    except:
        print('Entry invalid, please input number')

(you can actually simplify this and use list comprehension to make it even shorter, but I've written out the loops and made 2 of them so you can get the general idea).

Never-ending loop? Can't get python to stop running

Question

2 answers

solution1
2 2020-05-03 05:53:01

solution2
0 2020-07-08 02:10:43

Never-ending loop? Can't get python to stop running

Question

2 answers

solution1 2 2020-05-03 05:53:01

solution2 0 2020-07-08 02:10:43

solution1
2 2020-05-03 05:53:01

solution2
0 2020-07-08 02:10:43