[英]Counting strings in lists and then filtering & matching, in python
我有一個單詞列表,並且使用 python3 我計算了每個單詞組合之間的字母差異(使用這個站點的一個聰明的 diff_summing 算法):
import itertools
def diff_letters(a,b):
return sum ( a[i] != b[i] for i in range(len(a)) )
w = ['AAHS','AALS','DAHS','XYZA']
for x,y in itertools.combinations(w,2):
if diff_letters(x,y) == 1:
print(x,y)
這打印:
AAHS AALS
AAHS DAHS
我的問題:我如何計算和記錄字符串“DAHS”和“AALS”只有一個合作伙伴,而“AAHS”有兩個合作伙伴? 我將過濾每個target_string
恰好有一個near_matching_word
的方向組合,所以我的最終數據(作為 JSON)看起來像這樣:
[
{
"target_word": "DAHS",
"near_matching_word": "AAHS"
},
{
"target_word": "AALS",
"near_matching_word": "AAHS"
}
]
(注意 AAHS 沒有顯示為target_word
)
我有一個使用functools.reduce
的版本
import itertools
import functools
import operator
def diff_letters(a,b):
return sum ( a[i] != b[i] for i in range(len(a)) )
w = ['AAHS','AALS','DAHS','XYZA']
pairs = []
for x,y in itertools.combinations(w,2):
if diff_letters(x,y) == 1:
#print(x,y)
pairs.append((x,y))
full_list = functools.reduce(operator.add, pairs)
for x in full_list:
if full_list.count(x) == 1:
print (x)
哪個打印
AALS
DAHS
但隨后我將不得不 go 回到我的大列表pairs
以找到near_matching_word
。 當然,在我的最終版本中,列表pairs
會更大,並且target_word
可以是元組 (x,y) 中的第一項或第二項。
即使找到不止一個,其他答案也會保留所有對。 由於不需要它們,這似乎浪費了 memory。 這個答案只為每個字符串保留最多一對。
import collections
import itertools
def diff_letters(a,b):
return sum ( a[i] != b[i] for i in range(len(a)) )
w = ['AAHS','AALS','DAHS','XYZA']
# Marker for pairs that have not been found yet.
NOT_FOUND = object()
# Collection of found pairs x => y. Each item is in one of three states:
# - y is NOT_FOUND if x has not been seen yet
# - y is a string if it is the only accepted pair for x
# - y is None if there is more than one accepted pair for x
pairs = collections.defaultdict(lambda: NOT_FOUND)
for x,y in itertools.combinations(w,2):
if diff_letters(x,y) == 1:
if pairs[x] is NOT_FOUND:
pairs[x] = y
else:
pairs[x] = None
if pairs[y] is NOT_FOUND:
pairs[y] = x
else:
pairs[y] = None
# Remove None's and change into normal dict.
pairs = {x: y for x, y in pairs.items() if y}
for x, y in pairs.items():
print("Target = {}, Only near matching word = {}".format(x, y))
Output:
Target = AALS, Only near matching word = AAHS
Target = DAHS, Only near matching word = AAHS
您可以使用字典而不是對列表:
pairs = {}
for x, y in itertools.combinations(w, 2):
if diff_letters(x, y) == 1:
pairs.setdefault(x, []).append(y)
pairs.setdefault(y, []).append(x)
result = [{ "target_word": key, "near_matching_word": head, } for key, (head, *tail) in pairs.items() if not tail]
print(result)
Output
[{'target_word': 'AALS', 'near_matching_word': 'AAHS'}, {'target_word': 'DAHS', 'near_matching_word': 'AAHS'}]
在pairs
字典中,鍵是target_words
,值是near_matching_words
。 然后使用列表推導過濾掉超過 1 個near_matching_word
的那些。
import itertools
import functools
import operator
def diff_letters(a, b):
return sum(a[i] != b[i] for i in range(len(a)))
w = ['AAHS', 'AALS', 'DAHS', 'XYZA']
pairs = []
for x, y in itertools.combinations(w, 2):
if diff_letters(x, y) == 1:
pairs.append((x, y))
full_list = functools.reduce(operator.add, pairs)
result = []
for x in set(full_list):
if full_list.count(x) == 1:
pair = next((i for i in pairs if x in i))
match = [i for i in pair if i != x][0]
result.append({
"target_word": x,
"near_matching_word": match
})
print(result)
輸出:
[{'target_word': 'DAHS', 'near_matching_word': 'AAHS'}, {'target_word': 'AALS', 'near_matching_word': 'AAHS'}]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.