[英]Concatenate two strings with a common substring?
说我有琴弦
string1 = 'Hello how are you'
string2 = 'are you doing now?'
结果应该是这样的
Hello how are you doing now?
我在考虑使用re
和字符串搜索的不同方式。 ( 最长的普通子串问题 )
但是有没有简单的方法(或库)在python中做到这一点?
为了清楚起见,我将再添加一组测试字符串!
string1 = 'This is a nice ACADEMY'
string2 = 'DEMY you know!'
结果就是!,
'This is a nice ACADEMY you know!'
应该这样做:
string1 = 'Hello how are you'
string2 = 'are you doing now?'
i = 0
while not string2.startswith(string1[i:]):
i += 1
sFinal = string1[:i] + string2
输出:
>>> sFinal
'Hello how are you doing now?'
或者,使其成为一个函数,以便您无需重写即可再次使用它:
def merge(s1, s2):
i = 0
while not s2.startswith(s1[i:]):
i += 1
return s1[:i] + s2
输出:
>>> merge('Hello how are you', 'are you doing now?')
'Hello how are you doing now?'
>>> merge("This is a nice ACADEMY", "DEMY you know!")
'This is a nice ACADEMY you know!'
这应该做您想要的:
def overlap_concat(s1, s2):
l = min(len(s1), len(s2))
for i in range(l, 0, -1):
if s1.endswith(s2[:i]):
return s1 + s2[i:]
return s1 + s2
例子:
>>> overlap_concat("Hello how are you", "are you doing now?")
'Hello how are you doing now?'
>>>
>>> overlap_concat("This is a nice ACADEMY", "DEMY you know!")
'This is a nice ACADEMY you know!'
>>>
使用str.endswith
和enumerate
:
def overlap(string1, string2):
for i, s in enumerate(string2, 1):
if string1.endswith(string2[:i]):
break
return string1 + string2[i:]
>>> overlap("Hello how are you", "are you doing now?")
'Hello how are you doing now?'
>>> overlap("This is a nice ACADEMY", "DEMY you know!")
'This is a nice ACADEMY you know!'
如果要考虑尾随特殊字符,则需要使用一些基于re
的替换。
import re
string1 = re.sub('[^\w\s]', '', string1)
尽管注意,这会删除第一个字符串中的所有特殊字符。
对上述函数的修改将找到最长的匹配子字符串(而不是最短的子字符串),涉及反向遍历string2
。
def overlap(string1, string2):
for i in range(len(s)):
if string1.endswith(string2[:len(string2) - i]):
break
return string1 + string2[len(string2) - i:]
>>> overlap('Where did', 'did you go?')
'Where did you go?'
其他答案都是好人,但对于此输入确实失败了。
string1 = 'THE ACADEMY has'
string2= '.CADEMY has taken'
输出:
>>> merge(string1,string2)
'THE ACADEMY has.CADEMY has taken'
>>> overlap(string1,string2)
'THE ACADEMY has'
但是,有一个标准库difflib
在我的情况下被证明是有效的!
match = SequenceMatcher(None, string1,\
string2).find_longest_match\
(0, len(string1), 0, len(string2))
print(match) # -> Match(a=0, b=15, size=9)
print(string1[: match.a + match.size]+string2[match.b + match.size:])
输出:
Match(a=5, b=1, size=10)
THE ACADEMY has taken
您要替换的单词出现在第二个字符串中,因此您可以尝试以下操作:
new_string=[string2.split()]
new=[]
new1=[j for item in new_string for j in item if j not in string1]
new1.insert(0,string1)
print(" ".join(new1))
与第一个测试用例:
string1 = 'Hello how are you'
string2 = 'are you doing now?'
输出:
Hello how are you doing now?
第二个测试用例:
string1 = 'This is a nice ACADEMY'
string2 = 'DEMY you know!'
输出:
This is a nice ACADEMY you know!
说明:
首先,我们将第二个字符串拆分,以便可以找到必须删除或替换的单词:
new_string=[string2.split()]
第二步,我们将使用string1检查此分隔符字符串的每个单词,如果该字符串中有任何单词而不是仅选择第一个字符串单词,则将该单词保留在第二个字符串中:
new1=[j for item in new_string for j in item if j not in string1]
此列表理解与:
new1=[]
for item in new_string:
for j in item:
if j not in string1:
new1.append(j)
最后一步结合了字符串和连接列表:
new1.insert(0,string1)
print(" ".join(new1))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.