Python-匹配2個列表中的字符串

Question

我有2個清單。 實際和預測。 我需要比較兩個列表並確定模糊匹配的數量。 我說模糊匹配的原因是因為它們不會完全相同。 我正在使用difflib庫中的SequenceMatcher。

def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()

我可以假設匹配百分比大於80％的字符串被認為是相同的。 范例清單

actual=[ "Appl", "Orange", "Ornge", "Peace"]
predicted=["Red", "Apple", "Green", "Peace", "Orange"]

我需要一種方法來選擇在實際列表中找到了預測列表中的Apple，Peace和Orange。 因此，只有3場比賽，而不是5場比賽。 我如何有效地做到這一點？

Answer 1

如果確實要尋找模糊匹配，則可以使用similar方法，使用以下集合理解來獲得所需的輸出。

threshold = 0.8
result = {x for x in predicted for y in actual if similar(x, y) > threshold}

Answer 2

您可以將兩個列表都設置為集合並對其應用交集。

這將為您提供三個項目{'Peace', 'Apple', 'Orange'} 。

然后，您可以計算結果集len與實際列表len之比。

actual=["Apple", "Appl", "Orange", "Ornge", "Peace"]
predicted=["Red", "Apple", "Green", "Peace", "Orange"]

res = set(actual).intersection(predicted)

print (res)
print ((len(res) / len(actual)) * 100)

編輯：

為了使用比率，您將需要實現嵌套循環。 由於集合是作為哈希表實現的，因此搜索為O（1），因此我更希望將實際值用作集合。

如果預測值與實際值（完全匹配）一致，則只需將其添加到結果集中即可。 （最好的情況是所有這樣，最終復雜度為O（n））。

如果預測值不是實際值，則循環遍歷實際值，並查找是否存在超過0.8的比率。 （最糟糕的情況是，一切都是這樣，復雜性（On ^ 2））

actual={"Appl", "Orange", "Ornge", "Peace"}
predicted=["Red", "Apple", "Green", "Peace", "Orange"]

result = {}

for pre in predicted:
    if pre in actual:
        result.add(pre)
    else:
        for act in actual:
            if (similar(pre, act) > 0.8):
                result.add(pre)

Answer 3

{x[1] for x in itertools.product(actual, predicted) if similar(*x) > 0.80}

Answer 4

>>> actual=["Apple", "Appl", "Orange", "Ornge", "Peace"]
>>> predicted=["Red", "Apple", "Green", "Peace", "Orange"]
>>> set(actual) & set(predicted)
set(['Orange', 'Peace', 'Apple'])

Answer 5

在這種情況下，您只需要檢查預測列表中的第i個元素是否在實際列表中。 如果存在，則添加到新列表。

In [2]: actual=["Apple", "Appl", "Orange", "Ornge", "Peace"]
...: predicted=["Red", "Apple", "Green", "Peace", "Orange"]


In [3]: [i for i in predicted if i in actual]
Out[3]: ['Apple', 'Peace', 'Orange']

Answer 6

一種簡單的方法（但無效）將是：

counter = 0
for item in b:
    if SequenceMatcher(None, a, item).ratio() > 0:
        counter += 1

這就是您想要的模糊匹配元素的數量，而不僅僅是相同的元素（由大多數其他答案提供）。

Answer 7

首先取兩個集合的交集：

actual, predicted = set(actual), set(predicted)

exact = actual.intersection(predicted)

如果這包括您所有的實際單詞，那么您就完成了。 然而，

if len(exact) < len(actual):
    fuzzy = [word for word in actual-predicted for match in predicted if similar(word, match)>0.8]

最后，您得到的結果集是exact.union(set(fuzzy))

Answer 8

您也可以嘗試以下方法來滿足您的要求：

import itertools

fuzlist = [ "Appl", "Orange", "Ornge", "Peace"]
actlist = ["Red", "Apple", "Green", "Peace", "Orange"]
foundlist = []
for fuzname in fuzlist:
    for name in actlist:
        for actname in itertools.permutations(name):
            if fuzname.lower() in ''.join(actname).lower():
                foundlist.append(name)
                break

print set(foundlist)

Python-匹配2個列表中的字符串

問題描述

8 個解決方案

解決方案1
2 已采納 2017-06-21 12:43:39

解決方案2
1 2017-06-21 12:37:08

解決方案3
1 2017-06-21 12:51:09

解決方案4
0 2017-06-21 12:36:31

解決方案5
0 2017-06-21 12:46:23

解決方案6
0 2017-06-21 12:47:30

解決方案7
0 2017-06-21 12:52:33

解決方案8
0 2017-06-21 14:16:44

Python-匹配2個列表中的字符串

問題描述

8 個解決方案

解決方案1 2 已采納 2017-06-21 12:43:39

解決方案2 1 2017-06-21 12:37:08

解決方案3 1 2017-06-21 12:51:09

解決方案4 0 2017-06-21 12:36:31

解決方案5 0 2017-06-21 12:46:23

解決方案6 0 2017-06-21 12:47:30

解決方案7 0 2017-06-21 12:52:33

解決方案8 0 2017-06-21 14:16:44

解決方案1
2 已采納 2017-06-21 12:43:39

解決方案2
1 2017-06-21 12:37:08

解決方案3
1 2017-06-21 12:51:09

解決方案4
0 2017-06-21 12:36:31

解決方案5
0 2017-06-21 12:46:23

解決方案6
0 2017-06-21 12:47:30

解決方案7
0 2017-06-21 12:52:33

解決方案8
0 2017-06-21 14:16:44