简体   繁体   English

python递归与冒泡排序

[英]python recursion with bubble sort

So, i have this problem where i recieve 2 strings of letters ACGT, one with only letters, the other contain letters and dashes "-".both are same length. 所以,我有这个问题,我收到两串字母ACGT,一个只有字母,另一个包含字母和短划线“ - ”。两者都是相同的长度。 the string with the dashes is compared to the string without it. 将带有破折号的字符串与没有它的字符串进行比较。 cell for cell. 细胞细胞。 and for each pairing i have a scoring system. 对于每个配对我都有一个评分系统。 i wrote this code for the scoring system: for example: dna1: -ACA dna2: TACG the scoring is -1. 我为评分系统编写了这段代码:例如:dna1:-ACA dna2:TACG得分为-1。 (because dash compared to a letter(T) gives -2, letter compared to same letter gives +1 (A to A), +1 (C to C) and non similar letters give (-1) so sum is -1. (因为破折号与字母(T)相比给出-2,字母与同一字母相比给出+1(A到A),+ 1(C到C)和非相似字母给(-1)所以sum是-1。

def get_score(dna1, dna2, match=1, mismatch=-1, gap=-2):
""""""

score = 0

for index in range(len(dna1)):
    if dna1[index] is dna2[index]:
        score += match
    elif dna1[index] is not dna2[index]:
        if "-" not in (dna1[index], dna2[index]):
            score += mismatch
        else:
            score += gap

this is working fine. 这工作正常。

now i have to use recursion to give the best possible score for 2 strings. 现在我必须使用递归来为2个字符串提供最好的分数。 i recieve 2 strings, they can be of different sizes this time. 我收到2根琴弦,这次可以有不同的尺寸。 ( i cant change the order of letters). (我不能改变字母的顺序)。 so i wrote this code that adds "-" as many times needed to the shorter string to create 2 strings of same length and put them in the start of list. 所以我写了这个代码,在短字符串中添加“ - ”多次,以创建2个相同长度的字符串,并将它们放在列表的开头。 now i want to start moving the dashes and record the score for every dash position, and finally get the highest posibble score. 现在我想开始移动破折号并记录每个破折号位置的得分,最后获得最高的posibble分数。 so for moving the dashes around i wrote a litle bubble sort.. but it dosnt seem to do what i want. 因此,为了移动破折号我写了一个小泡泡排序..但它似乎做我想要的东西。 i realize its a long quesiton but i'd love some help. 我意识到这是一个很长的问题,但我会喜欢一些帮助。 let me know if anything i wrote is not understood. 如果我写的任何内容都不明白,请告诉我。

def best_score(dna1, dna2, match=1, mismatch=-1, gap=-2,\
                         score=[], count=0):
""""""

diff = abs(len(dna1) - len(dna2))

if len(dna1) is len(dna2):
    short = []
elif len(dna1) < len(dna2):
    short = [base for base in iter(dna1)]
else:
    short = [base for base in iter(dna2)]

for i in range(diff):
    short.insert(count, "-")

for i in range(diff+count, len(short)-1):
    if len(dna1) < len(dna2):
        score.append((get_score(short, dna2),\
                      ''.join(short), dna2))
    else:
        score.append((get_score(dna1, short),\
                      dna1, ''.join(short)))
    short[i+1], short[i] = short[i], short[i+1]

if count is min(len(dna1), len(dna2)):
    return score[score.index(max(score))]
return best_score(dna1, dna2, 1, -1, -2, score, count+1)

First, if I correctly deciephered your cost function, your best score value do not depend on gap, as number of dashes is fixed. 首先,如果我正确地推断了您的成本函数,您的最佳得分值不依赖于差距,因为破折号的数量是固定的。

Second, it is lineary dependent on number of mismatches and so doesn't depend on match and mismatch exact values, as long as they are positive and negative respectively. 其次,它是线性的,取决于不匹配的数量,因此不依赖于匹配和不匹配的精确值,只要它们分别为正和负。

So your task reduces to lookup of a longest subsequence of longest string letters strictly matching subsequence of letters of the shortest one. 因此,您的任务减少为查找最长字符串字母的最长子序列,严格匹配最短字母的子序列。

Third, define by M(string, substr) function returnin length of best match from above. 第三,通过M(string, substr)函数定义从上面得到的最佳匹配长度。 If you smallest string fisrt letter is S , that is substr == 'S<letters>' , then 如果你最小的字符串fisrt字母是S ,那就是substr == 'S<letters>' ,那么

M(string, 'S<letters>') = \
    max(1 + M(string[string.index(S):], '<letters>') + # found S
            M(string[1:], '<letters>')) # letter S not found, placed at 1st place

latter is an easy to implement recursive expression. 后者是一个易于实现的递归表达式。

For a pair string, substr denoting m=M(string, substr) best score is equal 对于一对string, substr表示m=M(string, substr)最佳分数的子m=M(string, substr)相等

m * match + (len(substr) - m) * mismatch + (len(string)-len(substr)) * gap

It is straightforward, storing what value was max in recursive expression, to find what exactly best match is. 它很简单,在递归表达式中存储最大值,以找出最匹配的是什么。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM