简体   繁体   English

找到两个 DNA 串之间的汉明距离

[英]find the Hamming distance between two DNA strings

i'm just learning python 3 now.我现在只是在学习 python 3。 '''It's ask the user for two string and find the Hamming distance between the strings.Input sequences should only include nucleotides 'A', 'T', 'G' and 'C'. '''它要求用户输入两个字符串并找到字符串之间的汉明距离。输入序列应该只包含核苷酸 'A'、'T'、'G' 和 'C'。 Program should ask the user to reenter the sequence if user enter an invalid character.Program should be able to compare the strings are of same length.如果用户输入了无效字符,程序应该要求用户重新输入序列。程序应该能够比较相同长度的字符串。 If the strings are not of the same length program should ask the user to enter the strings again.User should be able to enter upper, lower or both cases as an input '''如果字符串的长度不同,程序应该要求用户再次输入字符串。用户应该能够输入大写、小写或两种情况作为输入 '''

The program should print the output in the following format:该程序应按以下格式打印输出:

please enter string one: GATTACA
please enter string two: GACTATA
GATTACA
|| || |  
GACTATA
The hamming distance of sequence GATTACA and GACTATA is 2
So the Hamming distance is 2.

What I already try below, but could not get answer.我已经在下面尝试过,但无法得到答案。

def hamming_distance(string1, string2):
    string1 = input("please enter first sequence")
    string2 = input("please enter second sequence")
    distance = 0
     L = len(string1)
    for i in range(L):
        if string1[i] != string2[i]:
            distance += 1
    return distance

the line indent error: L = len(strings1)行缩进错误: L = len(strings1)

def hamming_distance(s1, s2):
    if len(s1) != len(s2):
        raise ValueError("Strand lengths are not equal!")
    return sum(ch1 != ch2 for ch1,ch2 in zip(s1,s2))

Alternatively, you could use this.或者,您可以使用它。 I also added a check that raises an exception because the hamming distance is only defined for sequences of equal length, so an attempt to calculate it between sequences of different lengths should not work.我还添加了一个引发异常的检查,因为汉明距离仅针对相同长度的序列定义,因此尝试在不同长度的序列之间计算它不应该起作用。

def distance(str1, str2):
    if len(str1) != len(str2):
        raise ValueError("Strand lengths are not equal!")
    else:
        return sum(1 for (a, b) in zip(str1, str2) if a != b)

Wiki page has elegant python and C implementations for computing hamming distance . Wiki 页面具有用于计算汉明距离的优雅 python 和 C 实现。 This implementation assumes that hamming distance is invalid for sequences of varying length.此实现假设汉明距离对于不同长度的序列无效。 However, there are two possible ways to report/compute distance for strings of varying length:但是,有两种可能的方法可以报告/计算不同长度的字符串的距离:

1) Perform multiple sequence alignment and then compute hamming distance between the two gap-filled character arrays ... formally referred to as edit distance or Levenshtein distance . 1) 执行多序列比对,然后计算两个空位填充字符数组之间的汉明距离……正式称为编辑距离或Levenshtein 距离

2) Alternatively, one could use the zip_longest function from iterttools. 2) 或者,可以使用 iterttools 中的 zip_longest 函数。 The following implementation will be equivalent to adding a string of gap characters at the end of shorter string so as to match the length of the longer string.下面的实现将等价于在较短字符串的末尾添加一串间隙字符,以匹配较长字符串的长度。 [Note: As compared to approach 1 value returned by this method would be an over-estimate of the distance as it doesn't account for alignment] [注意:与此方法返回的方法 1 值相比,它会高估距离,因为它不考虑对齐]

import itertools

def hammingDist(str1, str2, fillchar = '-'):
    return sum([ch1 != ch2 for (ch1,ch2) in itertools.zip_longest(str1, str2, fillvalue = fillchar)])


def main():
    # Running test cases:    
    print('Expected value \t Value returned')
    print(0,'\t', hammingDist('ABCD','ABCD'))
    print(1,'\t', hammingDist('ABCD','ABED'))
    print(2,'\t', hammingDist('ABCD','ABCDEF'))
    print(2,'\t', hammingDist('ABCDEF','ABCD'))
    print(4,'\t', hammingDist('ABCD',''))
    print(4,'\t', hammingDist('','ABCD'))
    print(1,'\t', hammingDist('ABCD','ABcD'))

if __name__ == "__main__":
    main()    
    import itertools

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM