简体   繁体   中英

Find the position of difference between two strings

I have two strings of equal length, how can I find all the locations where the strings are different?

For example, "HELPMEPLZ" and "HELPNEPLX" are different at positions 4 and 8.

Try this:

s1 = 'HELPMEPLZ'
s2 = 'HELPNEPLX'
[i for i in xrange(len(s1)) if s1[i] != s2[i]]

It will return:

> [4, 8]

The above solution will return a list with the indexes in sorted order, won't create any unnecessary intermediate data structures and it will work on Python 2.3 - 2.7. For Python 3.x replace xrange for range .

Python really comes with batteries included. Have a look at difflib

>>> import difflib
>>> a='HELPMEPLZ'
>>> b='HELPNEPLX'
>>> s = difflib.SequenceMatcher(None, a, b)
>>> for block in s.get_matching_blocks():
...     print block
Match(a=0, b=0, size=4)
Match(a=5, b=5, size=3)
Match(a=9, b=9, size=0)

difflib is very powerful and a some study of the documentation is really recommended.

>>> from itertools import izip
>>> s1 = 'HELPMEPLZ'
>>> s2 = 'HELPNEPLX'
>>> [i for i,(a1,a2)  in enumerate(izip(s1,s2)) if a1!=a2]
[4, 8]

If you store the two strings in a and b , you can loop through all the items and check for inequality.

python interactive interpreter:

>>> for i in range(len(a)):
...   if a[i] != b[i]: print i, a[i], b[i]
... 
4 M N
8 Z X

Another way to do this is with list comprehensions. It's all in one line, and the output is a list.

>>> [i for i in range(len(a)) if a[i] != b[i]]
[4, 8]

That makes it really easy to wrap into a function, which makes calling it on a variety of inputs easy.

>>> def dif(a, b):
...     return [i for i in range(len(a)) if a[i] != b[i]]
...
>>> dif('HELPMEPLZ', 'HELPNEPLX')
[4, 8]
>>> dif('stackoverflow', 'stacklavaflow')
[5, 6, 7, 8]

Pair up the strings character-by-character and iterate over this collection together with a counting index. Test whether the characters in each pair differ; if they do, output the index of where.

Using Python builtin functions you can do this neatly in one line:

>>> x = 'HELPMEPLZ'
>>> y = 'HELPNEPLX'
>>> {i for i, (left, right) in enumerate(zip(x,y)) if left != right}
{8, 4}

Building on the direction pointed by @FredrikPihl, here you have a solution that is also able to detect insertions/deletions using a module in the Python Standard Library:

import difflib
a = 'HELPMEPLZ'
b = 'HELPNEPLX'
s = difflib.SequenceMatcher(None, a, b, autojunk=False)
for tag, i1, i2, j1, j2 in s.get_opcodes():
    if tag != 'equal':
        print('{:7}   a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(
            tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2]))

With the output:

replace   a[4:5] --> b[4:5]      'M' --> 'N'
replace   a[8:9] --> b[8:9]      'Z' --> 'X'

Let's see how it works with a similar example including deletions and additions:

a = 'HELPMEPLZ'
b = 'HLP NEE PLX'

With the output:

delete    a[1:2] --> b[1:1]      'E' --> ''
replace   a[4:5] --> b[3:5]      'M' --> ' N'
insert    a[6:6] --> b[6:8]       '' --> 'E '
replace   a[8:9] --> b[10:11]      'Z' --> 'X'

Easiest way is to split data into two char arrays and then loop through comparing the letters and return the index when the two chars do not equal each other.

This method will work fine as long as both strings are equal in length.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM