简体   繁体   中英

python recursion with bubble sort

So, i have this problem where i recieve 2 strings of letters ACGT, one with only letters, the other contain letters and dashes "-".both are same length. the string with the dashes is compared to the string without it. cell for cell. and for each pairing i have a scoring system. i wrote this code for the scoring system: for example: dna1: -ACA dna2: TACG the scoring is -1. (because dash compared to a letter(T) gives -2, letter compared to same letter gives +1 (A to A), +1 (C to C) and non similar letters give (-1) so sum is -1.

def get_score(dna1, dna2, match=1, mismatch=-1, gap=-2):
""""""

score = 0

for index in range(len(dna1)):
    if dna1[index] is dna2[index]:
        score += match
    elif dna1[index] is not dna2[index]:
        if "-" not in (dna1[index], dna2[index]):
            score += mismatch
        else:
            score += gap

this is working fine.

now i have to use recursion to give the best possible score for 2 strings. i recieve 2 strings, they can be of different sizes this time. ( i cant change the order of letters). so i wrote this code that adds "-" as many times needed to the shorter string to create 2 strings of same length and put them in the start of list. now i want to start moving the dashes and record the score for every dash position, and finally get the highest posibble score. so for moving the dashes around i wrote a litle bubble sort.. but it dosnt seem to do what i want. i realize its a long quesiton but i'd love some help. let me know if anything i wrote is not understood.

def best_score(dna1, dna2, match=1, mismatch=-1, gap=-2,\
                         score=[], count=0):
""""""

diff = abs(len(dna1) - len(dna2))

if len(dna1) is len(dna2):
    short = []
elif len(dna1) < len(dna2):
    short = [base for base in iter(dna1)]
else:
    short = [base for base in iter(dna2)]

for i in range(diff):
    short.insert(count, "-")

for i in range(diff+count, len(short)-1):
    if len(dna1) < len(dna2):
        score.append((get_score(short, dna2),\
                      ''.join(short), dna2))
    else:
        score.append((get_score(dna1, short),\
                      dna1, ''.join(short)))
    short[i+1], short[i] = short[i], short[i+1]

if count is min(len(dna1), len(dna2)):
    return score[score.index(max(score))]
return best_score(dna1, dna2, 1, -1, -2, score, count+1)

First, if I correctly deciephered your cost function, your best score value do not depend on gap, as number of dashes is fixed.

Second, it is lineary dependent on number of mismatches and so doesn't depend on match and mismatch exact values, as long as they are positive and negative respectively.

So your task reduces to lookup of a longest subsequence of longest string letters strictly matching subsequence of letters of the shortest one.

Third, define by M(string, substr) function returnin length of best match from above. If you smallest string fisrt letter is S , that is substr == 'S<letters>' , then

M(string, 'S<letters>') = \
    max(1 + M(string[string.index(S):], '<letters>') + # found S
            M(string[1:], '<letters>')) # letter S not found, placed at 1st place

latter is an easy to implement recursive expression.

For a pair string, substr denoting m=M(string, substr) best score is equal

m * match + (len(substr) - m) * mismatch + (len(string)-len(substr)) * gap

It is straightforward, storing what value was max in recursive expression, to find what exactly best match is.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM