简体   繁体   English

Python:比较两个字符串并返回它们共有的最长段

[英]Python: Compare two strings and return the longest segment that they have in common

As a novice in Python, I have written a working function that will compare two strings and search for the longest substring shared by both strings. 作为Python的新手,我编写了一个工作函数,该函数将比较两个字符串并搜索两个字符串共享的最长子字符串。 For instance, when the function compares "goggle" and "google", it will identify "go" and "gle" as the two common substrings (excluding single letters), but will only return "gle" since it's the longest one. 例如,当函数比较“ goggle”和“ google”时,它将“ go”和“ gle”标识为两个常见的子字符串(不包括单个字母),但由于它是最长的,因此仅返回“ gle”。

I would like to know if anywhere part of my code can be improved/re-written, as it may be considered lengthy and convoluted. 我想知道我的代码的任何部分是否可以改进/重写,因为它可能被认为冗长且令人费解。 I'll also be very glad to see other approaches to the solution. 我也很高兴看到解决方案的其他方法。 Thanks in advance! 提前致谢!

def longsub(string1, string2):
    sublist = []
    i=j=a=b=count=length=0

    while i < len(string1):
        while j < len(string2):
            if string1[i:a+1] == string2[j:b+1] and (a+1) <= len(string1) and (b+1) <= len(string2):
                a+=1
                b+=1
                count+=1
            else:
                if count > 0:
                    sublist.append(string1[i:a])
                count = 0
                j+=1
                b=j
                a=i
        j=b=0
        i+=1
        a=i

    while len(sublist) > 1:
        for each in sublist:
            if len(each) >= length:
                length = len(each)
            else:
                sublist.remove(each)

    return sublist[0]

Edit: Comparing "goggle" and "google" may have been a bad example, since they are equal length with longest common segments in the same positions. 编辑:比较“凝视”和“谷歌”可能是一个不好的例子,因为它们的长度相等,最长的共同段在相同的位置。 The actual inputs would be closer to this: "xabcdkejp" and "zkdieaboabcd". 实际输入将更接近于此:“ xabcdkejp”和“ zkdieaboabcd”。 Correct output should be "abcd". 正确的输出应为“ abcd”。

在标准库中实际上恰好有一个函数: difflib.SequencMatcher.find_longest_match

EDIT : This algorithm only works when the words have the longest segment in the same indices 编辑 :仅当单词在相同索引中具有最长的片段时,此算法才有效

You can get away with only one loop. 您只需要一个循环就可以摆脱。 Use helper variables. 使用辅助变量。 Something like these (needs refactoring) http://codepad.org/qErRBPav : 像这样的东西(需要重构) http://codepad.org/qErRBPav

word1 = "google"
word2 = "goggle"

longestSegment = ""
tempSegment = ""

for i in range(len(word1)):
    if word1[i] == word2[i]:
        tempSegment += word1[i]
    else: tempSegment = ""

    if len(tempSegment) > len(longestSegment):
        longestSegment = tempSegment

print longestSegment # "gle"

EDIT : mgilson's proposal of using find_longest_match (works for varying positions of the segments): 编辑 :使用mgilson的建议find_longest_match (适用于不同段的位置):

from difflib import SequenceMatcher

word1 = "google"
word2 = "goggle"

s = SequenceMatcher(None, word1, word2)
match = s.find_longest_match(0, len(word1), 0, len(word2))

print word1[match.a:(match.b+match.size)] # "gle"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 Python 在两个字符串之间找到最长的公共 substring? - How to find the longest common substring between two strings using Python? 正则表达式查找两个字符串的最长公共前缀 - Regexp finding longest common prefix of two strings 如何以可能使用库函数的pythonic方式找到python中两个字符串之间的最长公共后缀前缀? - How to find the longest common suffix prefix between two strings in python in a pythonic way possibly using library functions? 比较两组字符串,然后返回在Python 3.4中不同的整个字符串 - Compare two sets of strings and then return whole strings that are different in Python 3.4 比较两个在python中可能有或没有共同值的字典列表 - Compare two lists of dictionaries that may or may not have a common value in python Python - 比较两列功能,返回两者不相同的值 - Python - Compare two columns of features, return values which are not common to both 比较python中的两个字符串 - Compare two strings in python 来自两个以上字符串的最长公共子字符串 - Longest common substring from more than two strings 没有字符串连接的两个字符串的最长公共序列 O(mn) - Longest Common Sequence O(mn) of two strings without string concatenation 来自两个以上字符串的最长公共单词序列 - Longest common sequence of words from more than two strings
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM