简体   繁体   English

找到两个字符串之间的差异位置

[英]Find the position of difference between two strings

I have two strings of equal length, how can I find all the locations where the strings are different?我有两个长度相等的字符串,如何找到字符串不同的所有位置?

For example, "HELPMEPLZ" and "HELPNEPLX" are different at positions 4 and 8.例如,“HELPMEPLZ”和“HELPNEPLX”在位置 4 和 8 处不同。

Try this:尝试这个:

s1 = 'HELPMEPLZ'
s2 = 'HELPNEPLX'
[i for i in xrange(len(s1)) if s1[i] != s2[i]]

It will return:它将返回:

> [4, 8]

The above solution will return a list with the indexes in sorted order, won't create any unnecessary intermediate data structures and it will work on Python 2.3 - 2.7.上述解决方案将按排序顺序返回一个包含索引的列表,不会创建任何不必要的中间数据结构,并且可以在 Python 2.3 - 2.7 上运行。 For Python 3.x replace xrange for range .对于 Python 3.x,将xrange替换为range

Python really comes with batteries included. Python 确实附带了电池。 Have a look at difflib看看difflib

>>> import difflib
>>> a='HELPMEPLZ'
>>> b='HELPNEPLX'
>>> s = difflib.SequenceMatcher(None, a, b)
>>> for block in s.get_matching_blocks():
...     print block
Match(a=0, b=0, size=4)
Match(a=5, b=5, size=3)
Match(a=9, b=9, size=0)

difflib is very powerful and a some study of the documentation is really recommended. difflib非常强大,真的建议对文档进行一些研究。

>>> from itertools import izip
>>> s1 = 'HELPMEPLZ'
>>> s2 = 'HELPNEPLX'
>>> [i for i,(a1,a2)  in enumerate(izip(s1,s2)) if a1!=a2]
[4, 8]

If you store the two strings in a and b , you can loop through all the items and check for inequality.如果将两个字符串存储在ab ,则可以遍历所有项目并检查不等式。

python interactive interpreter: python交互式解释器:

>>> for i in range(len(a)):
...   if a[i] != b[i]: print i, a[i], b[i]
... 
4 M N
8 Z X

Another way to do this is with list comprehensions.另一种方法是使用列表推导式。 It's all in one line, and the output is a list.全部在一行中,输出是一个列表。

>>> [i for i in range(len(a)) if a[i] != b[i]]
[4, 8]

That makes it really easy to wrap into a function, which makes calling it on a variety of inputs easy.这使得包装成一个函数变得非常容易,这使得在各种输入上调用它变得容易。

>>> def dif(a, b):
...     return [i for i in range(len(a)) if a[i] != b[i]]
...
>>> dif('HELPMEPLZ', 'HELPNEPLX')
[4, 8]
>>> dif('stackoverflow', 'stacklavaflow')
[5, 6, 7, 8]

Pair up the strings character-by-character and iterate over this collection together with a counting index.将字符串逐个字符配对,并与计数索引一起迭代此集合。 Test whether the characters in each pair differ;测试每对中的字符是否不同; if they do, output the index of where.如果有,输出 where 的索引。

Using Python builtin functions you can do this neatly in one line:使用 Python 内置函数,您可以在一行中巧妙地完成此操作:

>>> x = 'HELPMEPLZ'
>>> y = 'HELPNEPLX'
>>> {i for i, (left, right) in enumerate(zip(x,y)) if left != right}
{8, 4}

Building on the direction pointed by @FredrikPihl, here you have a solution that is also able to detect insertions/deletions using a module in the Python Standard Library:根据@FredrikPihl 指出的方向,这里有一个解决方案,它也能够使用 Python 标准库中的模块检测插入/删除:

import difflib
a = 'HELPMEPLZ'
b = 'HELPNEPLX'
s = difflib.SequenceMatcher(None, a, b, autojunk=False)
for tag, i1, i2, j1, j2 in s.get_opcodes():
    if tag != 'equal':
        print('{:7}   a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format(
            tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2]))

With the output:随着输出:

replace   a[4:5] --> b[4:5]      'M' --> 'N'
replace   a[8:9] --> b[8:9]      'Z' --> 'X'

Let's see how it works with a similar example including deletions and additions:让我们看看它如何与一个类似的例子一起工作,包括删除和添加:

a = 'HELPMEPLZ'
b = 'HLP NEE PLX'

With the output:随着输出:

delete    a[1:2] --> b[1:1]      'E' --> ''
replace   a[4:5] --> b[3:5]      'M' --> ' N'
insert    a[6:6] --> b[6:8]       '' --> 'E '
replace   a[8:9] --> b[10:11]      'Z' --> 'X'

Easiest way is to split data into two char arrays and then loop through comparing the letters and return the index when the two chars do not equal each other.最简单的方法是将数据拆分为两个字符数组,然后循环比较字母并在两个字符不相等时返回索引。

This method will work fine as long as both strings are equal in length.只要两个字符串的长度相等,此方法就可以正常工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM