简体   繁体   中英

Python: Compare two strings and return the longest segment that they have in common

As a novice in Python, I have written a working function that will compare two strings and search for the longest substring shared by both strings. For instance, when the function compares "goggle" and "google", it will identify "go" and "gle" as the two common substrings (excluding single letters), but will only return "gle" since it's the longest one.

I would like to know if anywhere part of my code can be improved/re-written, as it may be considered lengthy and convoluted. I'll also be very glad to see other approaches to the solution. Thanks in advance!

def longsub(string1, string2):
    sublist = []
    i=j=a=b=count=length=0

    while i < len(string1):
        while j < len(string2):
            if string1[i:a+1] == string2[j:b+1] and (a+1) <= len(string1) and (b+1) <= len(string2):
                a+=1
                b+=1
                count+=1
            else:
                if count > 0:
                    sublist.append(string1[i:a])
                count = 0
                j+=1
                b=j
                a=i
        j=b=0
        i+=1
        a=i

    while len(sublist) > 1:
        for each in sublist:
            if len(each) >= length:
                length = len(each)
            else:
                sublist.remove(each)

    return sublist[0]

Edit: Comparing "goggle" and "google" may have been a bad example, since they are equal length with longest common segments in the same positions. The actual inputs would be closer to this: "xabcdkejp" and "zkdieaboabcd". Correct output should be "abcd".

在标准库中实际上恰好有一个函数: difflib.SequencMatcher.find_longest_match

EDIT : This algorithm only works when the words have the longest segment in the same indices

You can get away with only one loop. Use helper variables. Something like these (needs refactoring) http://codepad.org/qErRBPav :

word1 = "google"
word2 = "goggle"

longestSegment = ""
tempSegment = ""

for i in range(len(word1)):
    if word1[i] == word2[i]:
        tempSegment += word1[i]
    else: tempSegment = ""

    if len(tempSegment) > len(longestSegment):
        longestSegment = tempSegment

print longestSegment # "gle"

EDIT : mgilson's proposal of using find_longest_match (works for varying positions of the segments):

from difflib import SequenceMatcher

word1 = "google"
word2 = "goggle"

s = SequenceMatcher(None, word1, word2)
match = s.find_longest_match(0, len(word1), 0, len(word2))

print word1[match.a:(match.b+match.size)] # "gle"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM