Python-匹配2个列表中的字符串

Question

我有2个清单。 实际和预测。 我需要比较两个列表并确定模糊匹配的数量。 我说模糊匹配的原因是因为它们不会完全相同。 我正在使用difflib库中的SequenceMatcher。

def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()

我可以假设匹配百分比大于80％的字符串被认为是相同的。 范例清单

actual=[ "Appl", "Orange", "Ornge", "Peace"]
predicted=["Red", "Apple", "Green", "Peace", "Orange"]

我需要一种方法来选择在实际列表中找到了预测列表中的Apple，Peace和Orange。 因此，只有3场比赛，而不是5场比赛。 我如何有效地做到这一点？

Answer 1

如果确实要寻找模糊匹配，则可以使用similar方法，使用以下集合理解来获得所需的输出。

threshold = 0.8
result = {x for x in predicted for y in actual if similar(x, y) > threshold}

Answer 2

您可以将两个列表都设置为集合并对其应用交集。

这将为您提供三个项目{'Peace', 'Apple', 'Orange'} 。

然后，您可以计算结果集len与实际列表len之比。

actual=["Apple", "Appl", "Orange", "Ornge", "Peace"]
predicted=["Red", "Apple", "Green", "Peace", "Orange"]

res = set(actual).intersection(predicted)

print (res)
print ((len(res) / len(actual)) * 100)

编辑：

为了使用比率，您将需要实现嵌套循环。 由于集合是作为哈希表实现的，因此搜索为O（1），因此我更希望将实际值用作集合。

如果预测值与实际值（完全匹配）一致，则只需将其添加到结果集中即可。 （最好的情况是所有这样，最终复杂度为O（n））。

如果预测值不是实际值，则循环遍历实际值，并查找是否存在超过0.8的比率。 （最糟糕的情况是，一切都是这样，复杂性（On ^ 2））

actual={"Appl", "Orange", "Ornge", "Peace"}
predicted=["Red", "Apple", "Green", "Peace", "Orange"]

result = {}

for pre in predicted:
    if pre in actual:
        result.add(pre)
    else:
        for act in actual:
            if (similar(pre, act) > 0.8):
                result.add(pre)

Answer 3

{x[1] for x in itertools.product(actual, predicted) if similar(*x) > 0.80}

Answer 4

>>> actual=["Apple", "Appl", "Orange", "Ornge", "Peace"]
>>> predicted=["Red", "Apple", "Green", "Peace", "Orange"]
>>> set(actual) & set(predicted)
set(['Orange', 'Peace', 'Apple'])

Answer 5

在这种情况下，您只需要检查预测列表中的第i个元素是否在实际列表中。 如果存在，则添加到新列表。

In [2]: actual=["Apple", "Appl", "Orange", "Ornge", "Peace"]
...: predicted=["Red", "Apple", "Green", "Peace", "Orange"]


In [3]: [i for i in predicted if i in actual]
Out[3]: ['Apple', 'Peace', 'Orange']

Answer 6

一种简单的方法（但无效）将是：

counter = 0
for item in b:
    if SequenceMatcher(None, a, item).ratio() > 0:
        counter += 1

这就是您想要的模糊匹配元素的数量，而不仅仅是相同的元素（由大多数其他答案提供）。

Answer 7

首先取两个集合的交集：

actual, predicted = set(actual), set(predicted)

exact = actual.intersection(predicted)

如果这包括您所有的实际单词，那么您就完成了。 然而，

if len(exact) < len(actual):
    fuzzy = [word for word in actual-predicted for match in predicted if similar(word, match)>0.8]

最后，您得到的结果集是exact.union(set(fuzzy))

Answer 8

您也可以尝试以下方法来满足您的要求：

import itertools

fuzlist = [ "Appl", "Orange", "Ornge", "Peace"]
actlist = ["Red", "Apple", "Green", "Peace", "Orange"]
foundlist = []
for fuzname in fuzlist:
    for name in actlist:
        for actname in itertools.permutations(name):
            if fuzname.lower() in ''.join(actname).lower():
                foundlist.append(name)
                break

print set(foundlist)

Python-匹配2个列表中的字符串

问题描述

8 个解决方案

解决方案1
2 已采纳 2017-06-21 12:43:39

解决方案2
1 2017-06-21 12:37:08

解决方案3
1 2017-06-21 12:51:09

解决方案4
0 2017-06-21 12:36:31

解决方案5
0 2017-06-21 12:46:23

解决方案6
0 2017-06-21 12:47:30

解决方案7
0 2017-06-21 12:52:33

解决方案8
0 2017-06-21 14:16:44

Python-匹配2个列表中的字符串

问题描述

8 个解决方案

解决方案1 2 已采纳 2017-06-21 12:43:39

解决方案2 1 2017-06-21 12:37:08

解决方案3 1 2017-06-21 12:51:09

解决方案4 0 2017-06-21 12:36:31

解决方案5 0 2017-06-21 12:46:23

解决方案6 0 2017-06-21 12:47:30

解决方案7 0 2017-06-21 12:52:33

解决方案8 0 2017-06-21 14:16:44

解决方案1
2 已采纳 2017-06-21 12:43:39

解决方案2
1 2017-06-21 12:37:08

解决方案3
1 2017-06-21 12:51:09

解决方案4
0 2017-06-21 12:36:31

解决方案5
0 2017-06-21 12:46:23

解决方案6
0 2017-06-21 12:47:30

解决方案7
0 2017-06-21 12:52:33

解决方案8
0 2017-06-21 14:16:44