如何删除Python中两个字符串之间的重复单词？

Question

我正在使用OCR进行项目。 经过一些操作，我有两个像这样的字符串：

s1 = "This text is a test of"
s2 = "a test of the reading device"

我想知道如何删除第二个字符串的重复单词。 我的想法是找到每个列表中重复出现的单词的位置。 我尝试了这个：

e1 = [x for x in s1.split()]
e2 = [y for y in s2.split()]

for i, item2 in enumerate(e2):
    if item2 in e1:
        print i, item2 #repeated word and index in the first string
        print e1.index(item2) #index in the second string

现在，我在第一个和第二个列表中有了重复的单词及其位置。 如果这些顺序相同，我需要用它来逐字比较。 这是因为相同的单词可能会在字符串中出现两次或更多次（未来验证）。

最后，我想要一个最终的字符串：

ns2 = "the reading device"    
sf= "This text is a test of the reading device"

我在Windows 7上使用python 2.7。

Answer 1

这是另一种尝试，

from difflib import SequenceMatcher as sq
match = sq(None, s1, s2).find_longest_match(0, len(s1), 0, len(s2))

结果

print s1 + s2[match.b+match.size:]

本文是对阅读设备的测试

Answer 2

也许这个吗？
' '.join([x for x in s1.split(' ')] + [y for y in s2.split(' ') if y not in s1.split(' ')])我没有测试过小心，但这可能是处理此类需求的好主意。

如何删除Python中两个字符串之间的重复单词？

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-01-11 06:38:35

解决方案2
0 2017-01-11 07:14:51

如何删除Python中两个字符串之间的重复单词？

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-01-11 06:38:35

解决方案2 0 2017-01-11 07:14:51

解决方案1
2 已采纳 2017-01-11 06:38:35

解决方案2
0 2017-01-11 07:14:51