在 python 中計算列表中的字符串，然后過濾和匹配

Question

我有一個單詞列表，並且使用 python3 我計算了每個單詞組合之間的字母差異（使用這個站點的一個聰明的 diff_summing 算法）：

import itertools

def diff_letters(a,b):
    return sum ( a[i] != b[i] for i in range(len(a)) )

w = ['AAHS','AALS','DAHS','XYZA']

for x,y in itertools.combinations(w,2):
    if diff_letters(x,y) == 1:
        print(x,y)

這打印：

AAHS AALS
AAHS DAHS

我的問題：我如何計算和記錄字符串“DAHS”和“AALS”只有一個合作伙伴，而“AAHS”有兩個合作伙伴？ 我將過濾每個target_string恰好有一個near_matching_word的方向組合，所以我的最終數據（作為 JSON）看起來像這樣：

[
 {
   "target_word": "DAHS",
   "near_matching_word": "AAHS"
 },
 {
   "target_word": "AALS",
   "near_matching_word": "AAHS"
 }
]

（注意 AAHS 沒有顯示為target_word ）

我有一個使用functools.reduce的版本

import itertools
import functools
import operator

def diff_letters(a,b):
    return sum ( a[i] != b[i] for i in range(len(a)) )

w = ['AAHS','AALS','DAHS','XYZA']

pairs = []
for x,y in itertools.combinations(w,2):
    if diff_letters(x,y) == 1:
        #print(x,y)
        pairs.append((x,y))

full_list = functools.reduce(operator.add, pairs)
for x in full_list:
    if full_list.count(x) == 1:
        print (x)

哪個打印

AALS
DAHS

但隨后我將不得不 go 回到我的大列表pairs以找到near_matching_word 。 當然，在我的最終版本中，列表pairs會更大，並且target_word可以是元組 (x,y) 中的第一項或第二項。

Answer 1

即使找到不止一個，其他答案也會保留所有對。 由於不需要它們，這似乎浪費了 memory。 這個答案只為每個字符串保留最多一對。

import collections
import itertools

def diff_letters(a,b):
    return sum ( a[i] != b[i] for i in range(len(a)) )

w = ['AAHS','AALS','DAHS','XYZA']

# Marker for pairs that have not been found yet.
NOT_FOUND = object()

# Collection of found pairs x => y. Each item is in one of three states:
# - y is NOT_FOUND if x has not been seen yet
# - y is a string if it is the only accepted pair for x
# - y is None if there is more than one accepted pair for x
pairs = collections.defaultdict(lambda: NOT_FOUND)

for x,y in itertools.combinations(w,2):
    if diff_letters(x,y) == 1:
        if pairs[x] is NOT_FOUND:
            pairs[x] = y
        else:
            pairs[x] = None
        if pairs[y] is NOT_FOUND:
            pairs[y] = x
        else:
            pairs[y] = None

# Remove None's and change into normal dict.
pairs = {x: y for x, y in pairs.items() if y}

for x, y in pairs.items():
    print("Target = {}, Only near matching word = {}".format(x, y))

Output：

Target = AALS, Only near matching word = AAHS
Target = DAHS, Only near matching word = AAHS

Answer 2

您可以使用字典而不是對列表：

pairs = {}
for x, y in itertools.combinations(w, 2):
    if diff_letters(x, y) == 1:
        pairs.setdefault(x, []).append(y)
        pairs.setdefault(y, []).append(x)

result = [{ "target_word": key, "near_matching_word": head, } for key, (head, *tail) in pairs.items() if not tail]

print(result)

Output

[{'target_word': 'AALS', 'near_matching_word': 'AAHS'}, {'target_word': 'DAHS', 'near_matching_word': 'AAHS'}]

在pairs字典中，鍵是target_words ，值是near_matching_words 。 然后使用列表推導過濾掉超過 1 個near_matching_word的那些。

Answer 3

import itertools
import functools
import operator


def diff_letters(a, b):
    return sum(a[i] != b[i] for i in range(len(a)))


w = ['AAHS', 'AALS', 'DAHS', 'XYZA']

pairs = []
for x, y in itertools.combinations(w, 2):
    if diff_letters(x, y) == 1:
        pairs.append((x, y))

full_list = functools.reduce(operator.add, pairs)

result = []

for x in set(full_list):
    if full_list.count(x) == 1:
        pair = next((i for i in pairs if x in i))
        match = [i for i in pair if i != x][0]
        result.append({
            "target_word": x,
            "near_matching_word": match
        })

print(result)

輸出：

[{'target_word': 'DAHS', 'near_matching_word': 'AAHS'}, {'target_word': 'AALS', 'near_matching_word': 'AAHS'}]

在 python 中計算列表中的字符串，然后過濾和匹配

問題描述

3 個解決方案

解決方案1
1 2019-10-23 14:37:40

解決方案2
0 2019-10-23 14:26:35

解決方案3
0 2019-10-23 14:30:29

在 python 中計算列表中的字符串，然后過濾和匹配

問題描述

3 個解決方案

解決方案1 1 2019-10-23 14:37:40

解決方案2 0 2019-10-23 14:26:35

解決方案3 0 2019-10-23 14:30:29

解決方案1
1 2019-10-23 14:37:40

解決方案2
0 2019-10-23 14:26:35

解決方案3
0 2019-10-23 14:30:29