Python-比較兩個列表時出現循環問題

Question

我有一個小問題，我試圖將2個列表中的單詞進行比較以建立相似度百分比，但這就是問題，如果我在每個列表中有2次相同的單詞，我得到的百分比是虛假的。

首先，我編寫了這個小腳本：

data1 = ['test', 'super', 'class', 'test', 'boom']
data2 = ['test', 'super', 'class', 'test', 'boom']
res = 0
nb = (len(data1) + len(data2)) / 2
if data1 and data2 and nb != 0:
    for id1, item1 in enumerate(data1):
        for id2, item2 in enumerate(data2):
            if item1 == item2:
                res += 1 - abs(id1 - id2) / nb
    print(res / nb * 100)

問題是，如果我在列表中有2次相同單詞，則該百分比將大於100％。 因此，為了解決這個問題，我在“ res + = 1-abs（id1-id2）/ nb”行之后添加了一個“ break”，但該百分比仍然是偽造的。

希望您了解我的問題，謝謝您的幫助！

Answer 1

您可以使用difflib.SequenceMatcher來比較兩個列表的相似性。 嘗試這個：

from difflib import SequenceMatcher as sm
data1 = ['test', 'super', 'class', 'test', 'boom']
data2 = ['test', 'super', 'class', 'test', 'boom']
matching_percentage = sm(None, data1, data2).ratio() * 100

輸出：

100.0

Answer 2

data1 = ['test', 'super', 'class', 'test', 'boom']
data2 = ['test', 'super', 'class', 'test', 'boom']
from collections import defaultdict

dic1 =defaultdict(int)
dic2=defaultdict(int)

for i in data1:
    dic1[i]+=1

for i in data2:
    dic2[i]+=1

count = 0

for i in dic1:
    if i in dic2.keys():
        count+=abs(dic2[i]-dic1[i])


result =( (1-count/(len(data1)+len(data2))) *100)

產量

100.0

Answer 3

試試這個代碼：

data1 = ['test', 'super', 'class', 'class', 'test', 'boom']
data2 = ['test', 'super', 'class', 'class', 'test', 'boom']
res = 0
nb = (len(data1) + len(data2)) / 2.0

def pos_iter(index, sz):
    yield index
    i1 = index - 1
    i2 = index + 1
    while i1 >=0 and i2 < sz:
        if i1 >= 0:
            yield i1
            i1 -=1
        if i2 < sz:
            yield i2
            i2 += 1
if data1 and data2 and nb != 0:
    for id1, item1 in enumerate(data1):
        for id2 in pos_iter(id1, len(data2)):
            item2 = data2[id2]
            if item1 == item2:
                res += max(0, 1 - abs(id1 - id2) / nb)
                break
    print(res / nb * 100)

代碼的問題是，您總是從頭開始在第二個data2尋找匹配的單詞。 如果單詞重復，這將給您無效的值。 您需要始終在data1搜索單詞的“周圍”位置，因為您要查找最接近的位置。

另外，您還需要中斷添加，否則帶有相同單詞的文本將超過1.0。 您的nb變量需要為double（否則python2將對除法結果取整）。 並且您應該確保1 - abs(id1 - id2) / nb大於零，因此我添加了max(0, ...) 。

Python-比較兩個列表時出現循環問題

問題描述

3 個解決方案

解決方案1
1 已采納 2019-06-14 08:18:12

解決方案2
0 2019-06-14 08:33:06

解決方案3
0 2019-06-14 08:44:19

Python-比較兩個列表時出現循環問題

問題描述

3 個解決方案

解決方案1 1 已采納 2019-06-14 08:18:12

解決方案2 0 2019-06-14 08:33:06

解決方案3 0 2019-06-14 08:44:19

解決方案1
1 已采納 2019-06-14 08:18:12

解決方案2
0 2019-06-14 08:33:06

解決方案3
0 2019-06-14 08:44:19