簡體   English   中英

如果列表中的子字符串完全包含列表中另一個字符串的子字符串,我如何在列表中找到子字符串匹配?

[英]How do I find a substring match in a list only if it completely contains a substring of another string in a list?

這些是2個清單: -

list1 = ['apple pie', 'apple cake', 'the apple pie', 'the apple cake', 'apple']

list2 = ['apple', 'lots of apple', 'here is an apple', 'humungous apple', 'carrot cake']

我已經嘗試了一種名為longest Substring finder的算法,但顧名思義,它並沒有返回我正在尋找的內容。

def longestSubstringFinder(string1, string2):
    answer = "NULL"
    len1, len2 = len(string1), len(string2)
    for i in range(len1):
        match = ""
        for j in range(len2):
            if (i + j < len1 and string1[i + j] == string2[j]):
                match += string2[j]
            else:
                if (len(match) > len(answer)): answer = match
                match = ""
    return answer


mylist = []

def call():
    for i in file_names_short:
        s1 = i
        for j in company_list:
            s2 = j
            s1 = s1.lower()
            s2 = s2.lower()
            while(longestSubstringFinder(s2,s1) != "NULL"):
                x = longestSubstringFinder(s2,s1)
                # print(x)
                mylist.append(x)
                s2 = s2.replace(x, ' ')

call()
print('[%s]' % ','.join(map(str, mylist)))

預期產量應為:

output = ['apple', 'apple', 'apple', 'apple', '']

apple這個詞並不總是固定為apple ,它是一個包含很多單詞的較大列表,但我總是在兩個列表中尋找匹配的單詞,而apple總是在list1最長的單詞

另一個例子(可能更清楚):

string1 = ['Walgreens & Co.', 'Amazon Inc''] 
string2 = ['walgreens customers', 'amazon products', 'other words'] 
output = ['walgreens', 'amazon', ''] 

編輯:編輯以獲得最長的匹配

list1 = ['apple pie cucumber', 'apple cake', 'the apple pie', 'the apple cake', 'apple']
list2 = ['apple cucumber', 'lots of apple', 'here is an apple', 'humungous apple', 'carrot cake']

result = []

for i in range(len(list1)):
    match = []
    words1, words2 = list1[i].split(), list2[i].split()
    for w in words1:
        if w in words2:
            match.append(w)

    longest = max(match, key=lambda x: len(x)) if match else ''
    result.append(longest)

print(result)

輸出:

['cucumber', 'apple', 'apple', 'apple', '']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM