[英]Python: Compare two strings and return the longest segment that they have in common
As a novice in Python, I have written a working function that will compare two strings and search for the longest substring shared by both strings. 作为Python的新手,我编写了一个工作函数,该函数将比较两个字符串并搜索两个字符串共享的最长子字符串。 For instance, when the function compares "goggle" and "google", it will identify "go" and "gle" as the two common substrings (excluding single letters), but will only return "gle" since it's the longest one.
例如,当函数比较“ goggle”和“ google”时,它将“ go”和“ gle”标识为两个常见的子字符串(不包括单个字母),但由于它是最长的,因此仅返回“ gle”。
I would like to know if anywhere part of my code can be improved/re-written, as it may be considered lengthy and convoluted. 我想知道我的代码的任何部分是否可以改进/重写,因为它可能被认为冗长且令人费解。 I'll also be very glad to see other approaches to the solution.
我也很高兴看到解决方案的其他方法。 Thanks in advance!
提前致谢!
def longsub(string1, string2):
sublist = []
i=j=a=b=count=length=0
while i < len(string1):
while j < len(string2):
if string1[i:a+1] == string2[j:b+1] and (a+1) <= len(string1) and (b+1) <= len(string2):
a+=1
b+=1
count+=1
else:
if count > 0:
sublist.append(string1[i:a])
count = 0
j+=1
b=j
a=i
j=b=0
i+=1
a=i
while len(sublist) > 1:
for each in sublist:
if len(each) >= length:
length = len(each)
else:
sublist.remove(each)
return sublist[0]
Edit: Comparing "goggle" and "google" may have been a bad example, since they are equal length with longest common segments in the same positions. 编辑:比较“凝视”和“谷歌”可能是一个不好的例子,因为它们的长度相等,最长的共同段在相同的位置。 The actual inputs would be closer to this: "xabcdkejp" and "zkdieaboabcd".
实际输入将更接近于此:“ xabcdkejp”和“ zkdieaboabcd”。 Correct output should be "abcd".
正确的输出应为“ abcd”。
在标准库中实际上恰好有一个函数: difflib.SequencMatcher.find_longest_match
EDIT : This algorithm only works when the words have the longest segment in the same indices 编辑 :仅当单词在相同索引中具有最长的片段时,此算法才有效
You can get away with only one loop. 您只需要一个循环就可以摆脱。 Use helper variables.
使用辅助变量。 Something like these (needs refactoring) http://codepad.org/qErRBPav :
像这样的东西(需要重构) http://codepad.org/qErRBPav :
word1 = "google"
word2 = "goggle"
longestSegment = ""
tempSegment = ""
for i in range(len(word1)):
if word1[i] == word2[i]:
tempSegment += word1[i]
else: tempSegment = ""
if len(tempSegment) > len(longestSegment):
longestSegment = tempSegment
print longestSegment # "gle"
EDIT : mgilson's proposal of using find_longest_match
(works for varying positions of the segments): 编辑 :使用mgilson的建议
find_longest_match
(适用于不同段的位置):
from difflib import SequenceMatcher
word1 = "google"
word2 = "goggle"
s = SequenceMatcher(None, word1, word2)
match = s.find_longest_match(0, len(word1), 0, len(word2))
print word1[match.a:(match.b+match.size)] # "gle"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.