[英]How do I find a substring match in a list only if it completely contains a substring of another string in a list?
這些是2個清單: -
list1 = ['apple pie', 'apple cake', 'the apple pie', 'the apple cake', 'apple']
list2 = ['apple', 'lots of apple', 'here is an apple', 'humungous apple', 'carrot cake']
我已經嘗試了一種名為longest Substring finder
的算法,但顧名思義,它並沒有返回我正在尋找的內容。
def longestSubstringFinder(string1, string2):
answer = "NULL"
len1, len2 = len(string1), len(string2)
for i in range(len1):
match = ""
for j in range(len2):
if (i + j < len1 and string1[i + j] == string2[j]):
match += string2[j]
else:
if (len(match) > len(answer)): answer = match
match = ""
return answer
mylist = []
def call():
for i in file_names_short:
s1 = i
for j in company_list:
s2 = j
s1 = s1.lower()
s2 = s2.lower()
while(longestSubstringFinder(s2,s1) != "NULL"):
x = longestSubstringFinder(s2,s1)
# print(x)
mylist.append(x)
s2 = s2.replace(x, ' ')
call()
print('[%s]' % ','.join(map(str, mylist)))
預期產量應為:
output = ['apple', 'apple', 'apple', 'apple', '']
apple
這個詞並不總是固定為apple
,它是一個包含很多單詞的較大列表,但我總是在兩個列表中尋找匹配的單詞,而apple
總是在list1
最長的單詞
另一個例子(可能更清楚):
string1 = ['Walgreens & Co.', 'Amazon Inc'']
string2 = ['walgreens customers', 'amazon products', 'other words']
output = ['walgreens', 'amazon', '']
編輯:編輯以獲得最長的匹配
list1 = ['apple pie cucumber', 'apple cake', 'the apple pie', 'the apple cake', 'apple']
list2 = ['apple cucumber', 'lots of apple', 'here is an apple', 'humungous apple', 'carrot cake']
result = []
for i in range(len(list1)):
match = []
words1, words2 = list1[i].split(), list2[i].split()
for w in words1:
if w in words2:
match.append(w)
longest = max(match, key=lambda x: len(x)) if match else ''
result.append(longest)
print(result)
輸出:
['cucumber', 'apple', 'apple', 'apple', '']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.