简体   繁体   中英

Starting with Python - Hamming distance of a list of strings

I'm new in Python and I'm trying to obtain the Hamming distance between a pair of DNA sequences. Although I was able to do this, I don't really know how to obtain a list of Hamming distances of more than one pair of DNA sequences. I wonder if anyone could please guide me on this.

dna1 = 'ACCTAT'
dna2 = 'CATTGA'

def distance(strand_a, strand_b):
    if len(strand_a) == len(strand_b):
        i = 0
        n = 0
        while i < len(strand_a):
            if strand_a[i] != strand_b[i]:
                i += 1
                n += 1
            else:
                i += 1
        return(n)
    else:
        raise ValueError("The strings are not the same length")

Output:

The distance is: 5

I wonder if anyone could please help me know which could be the best way to obtain a list of Hamming distances between three pairs of DNA sequences (I tried to do this myself by changing the code above, but I haven't been able to find the solution).

Given these two lists, I want to get the Hamming distance between the 1st, 2nd and 3rd pairs of DNA sequences:

dna1 = ['ACTGG','ATGCA','AACTG']
dna2 = ['ACTGA','ATGGG','ATGAC']

Where the output would be:

distances = [1, 2, 4]

Thank you all for your help!

You can try:

import numpy as np

dna1 = ['ACTGG','ATGCA','AACTG']
dna2 = ['ACTGA','ATGGG','ATGAC']

[(np.array(list(x)) != np.array(list(y))).sum() for x, y in zip(dna1, dna2)]

It gives:

[1, 2, 4]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM