python：比较两个字符串

Question

I would like to know if there is a library that will tell me approximately how similar two strings are 我想知道是否有一个库会告诉我两个字符串有多相似

I am not looking for anything specific, but in this case: 我不是在寻找具体的东西，但在这种情况下：

a = 'alex is a buff dude'
b = 'a;exx is a buff dud'

we could say that b and a are approximately 90% similar. 我们可以说b和a的相似度约为90％。

Is there a library which can do this? 有没有可以做到这一点的图书馆？

Answer 1

import difflib

>>> a = 'alex is a buff dude'
>>> b = 'a;exx is a buff dud'
>>> difflib.SequenceMatcher(None, a, b).ratio()

0.89473684210526316

Answer 2

Look for Levenshtein algorithm for comparing strings. 寻找用于比较字符串的Levenshtein算法。 Here's a random implementation found via google: http://hetland.org/coding/python/levenshtein.py 这是通过google发现的随机实现： http ： //hetland.org/coding/python/levenshtein.py

Answer 3

http://en.wikipedia.org/wiki/Levenshtein_distance http://en.wikipedia.org/wiki/Levenshtein_distance

There are a few libraries on pypi , but be aware that this is expensive, especially for longer strings. pypi上有一些库，但请注意这是昂贵的，特别是对于较长的字符串。

You may also want to check out python's difflib: http://docs.python.org/library/difflib.html 您可能还想查看python的difflib： http ：//docs.python.org/library/difflib.html

Answer 4

Other way is to use longest common substring. 其他方法是使用最长的公共子串。 Here a implementation in Daniweb with my lcs implementation (this is also defined in difflib) 这里是Daniweb中我的lcs实现的实现（这也在difflib中定义）

Here is simple length only version with list as data structure: 这是一个简单的长度版本，列表作为数据结构：

def longest_common_sequence(a,b):

    n1=len(a)
    n2=len(b)

    previous=[]
    for i in range(n2):
        previous.append(0)

    over = 0
    for ch1 in a:
        left = corner = 0
        for ch2 in b:
            over = previous.pop(0)
            if ch1 == ch2:
                this = corner + 1
            else:
                this = over if over >= left else left
            previous.append(this)
            left, corner = this, over
    return 200.0*previous.pop()/(n1+n2)

Here is my second version which actualy gives the common string with deque data structure (also with the example data use case): 这是我的第二个版本，它实际上给出了带有deque数据结构的公共字符串（也有示例数据用例）：

from collections import deque

a = 'alex is a buff dude'
b = 'a;exx is a buff dud'

def lcs_tuple(a,b):

    n1=len(a)
    n2=len(b)

    previous=deque()
    for i in range(n2):
        previous.append((0,''))

    over = (0,'')
    for i in range(n1):
        left = corner = (0,'')
        for j in range(n2):
            over = previous.popleft()
            if a[i] == b[j]:
                this = corner[0] + 1, corner[1]+a[i]
            else:
                this = max(over,left)
            previous.append(this)
            left, corner = this, over
    return 200.0*this[0]/(n1+n2),this[1]
print lcs_tuple(a,b)

""" Output:
(89.47368421052632, 'aex is a buff dud')
"""

python：比较两个字符串

问题描述

4 个解决方案

解决方案1
18 已采纳 2010-08-23 21:06:18

解决方案2
6 2010-08-23 20:34:24

解决方案3
6 2010-08-23 20:35:39

解决方案4
1 2010-08-23 21:12:42

python：比较两个字符串

问题描述

4 个解决方案

解决方案1 18 已采纳 2010-08-23 21:06:18

解决方案2 6 2010-08-23 20:34:24

解决方案3 6 2010-08-23 20:35:39

解决方案4 1 2010-08-23 21:12:42

解决方案1
18 已采纳 2010-08-23 21:06:18

解决方案2
6 2010-08-23 20:34:24

解决方案3
6 2010-08-23 20:35:39

解决方案4
1 2010-08-23 21:12:42