简体   繁体   English

Python比较两个字符串

[英]Python comparing two strings

Is there a function to compare how many characters two strings (of the same length) differ by? 有没有一个函数可以比较两个(相同长度)字符串之间有多少个字符不同? I mean only substitutions. 我的意思是只换人。 For example, AAA would differ from AAT by 1 character. 例如,AAA与AAT的区别是1个字符。

This will work: 这将起作用:

>>> str1 = "AAA"
>>> str2 = "AAT"
>>> sum(1 for x,y in enumerate(str1) if str2[x] != y)
1
>>> str1 = "AAABBBCCC"
>>> str2 = "ABCABCABC"
>>> sum(1 for x,y in enumerate(str1) if str2[x] != y)
6
>>>

The above solution uses sum , enumerate , and a generator expression . 上面的解决方案使用sumenumerategenerator表达式


Because True can evaluate to 1 , you could even do: 因为True可以求值为1 ,所以您甚至可以执行以下操作:

>>> str1 = "AAA"
>>> str2 = "AAT"
>>> sum(str2[x] != y for x,y in enumerate(str1))
1
>>>

But I personally prefer the first solution because it is clearer. 但是我个人更喜欢第一个解决方案,因为它更清晰。

This is a nice use case for the zip function! 这是zip功能的一个很好的用例!

def count_substitutions(s1, s2):
    return sum(x != y for (x, y) in zip(s1, s2))

Usage: 用法:

>>> count_substitutions('AAA', 'AAT')
1

From the docs: 从文档:

zip(...)
    zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]

    Return a list of tuples, where each tuple contains the i-th element
    from each of the argument sequences.  The returned list is truncated
    in length to the length of the shortest argument sequence.

Building on what poke said I would suggest the jellyfish package. 根据p说的内容,我建议您使用水母包装。 It has several distance measures like what you are asking for. 它具有几种您想要的距离测量方法。 Example from the documentation: 文档中的示例:

IN [1]: jellyfish.damerau_levenshtein_distance('jellyfish', 'jellyfihs')
OUT[1]: 1

or using your example: 或使用您的示例:

IN [2]: jellyfish.damerau_levenshtein_distance('AAA','AAT')
OUT[2]: 1

This will work for many different string lengths and should be able to handle most of what you throw at it. 这将适用于许多不同的字符串长度,并且应该能够处理您扔给它的大部分内容。

Similar to simon's answer, but you don't have to zip things in order to just call a function on the resulting tuples because that's what map does anyway (and itertools.imap in Python 2). 与simon的答案类似,但您不必为了在结果元组上调用函数而压缩所有内容,因为无论如何,这就是map功能(以及Python 2中的itertools.imap )。 And there's a handy function for != in operator . 并且operator有一个方便的!=功能。 Hence: 因此:

sum(map(operator.ne, s1, s2))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM