从字符串中删除多个字符序列

Question

If I had a string like so: 如果我有这样的字符串：

my_string = 'this is is is is a string'

How would I remove the multiple is s so that only one will show? 我将如何删除倍数is s以便仅显示一个？

This string could contain any number of is in there such as 此字符串可以包含任意数量的is在诸如有

my_string = 'this is is a string'
other_string = 'this is is is is is is is is a string'

A regex solution would be possible I suppose however I'm not sure how to go about it. 我想可以使用正则表达式解决方案，但是我不确定该怎么做。 Thanks. 谢谢。

Answer 1

You can use itertools.groupby 您可以使用itertools.groupby

from itertools import groupby
a = 'this is is is is a a a string string a a a'
print ' '.join(word for word, _ in groupby(a.split(' ')))

Answer 2

Here is my approach: 这是我的方法：

my_string = 'this is is a string'
other_string = 'this is is is is is is is is a string'
def getStr(s):
    res = []
    [res.append(i) for i in s.split() if i not in res]
    return ' '.join(res)

print getStr(my_string)
print getStr(other_string)

Output: 输出：

this is a string
this is a string

UPDATE The regex way to attack it: 更新正则表达式的攻击方式：

import re
print ' '.join(re.findall(r'(?:^|)(\w+)(?:\s+\1)*', other_string))

LIVE DEMO 现场演示

Answer 3

If you would like to remove all duplicates after one another, you can try 如果您想一次删除所有重复项，可以尝试

l = my_string.split()
tmp = [l[0]]
for word in l:
    if word != tmp[-1]:
        tmp.append(word)
s = ''
for word in tmp:
    s += word + ' '
my_string = s

of course, if you want it smarter than this, it is going to be more complicated. 当然，如果您要比这更智能，它将变得更加复杂。

Answer 4

For oneliners: 对于单行：

>>> import itertools
>>> my_string = 'this is is a string'
>>> " ".join([k for k, g in itertools.groupby(my_string.split())])
'this is a string'

Answer 5

Regex to the rescue! 正则表达式可以解救！

((\b\w+\b)\s*\2\s*)+
# capturing group
# inner capturing group
# ... consisting of a word boundary, at least ONE word character and another boundary
# followed by whitespaces
# and the formerly captured group (aka the inner group)
# the whole pattern needs to be present at least once, but can be there
# multiple times

Python Code Python代码

import re

string = """
this is is is is is is is is a string
and here is another another another another example
"""
rx = r'((\b\w+\b)\s*\2\s*)+'

string = re.sub(rx, r'\2 ', string)
print string
# this is a string
# and here is another example

Demos 演示

See a demo for this approach on regex101.com as well as on ideone.com 在regex101.com和ideone.com 上查看有关此方法的演示。

从字符串中删除多个字符序列

问题描述

5 个解决方案

解决方案1
1 2016-04-13 17:21:46

解决方案2
1 2016-04-13 17:23:51

解决方案3
0 2016-04-13 17:15:27

解决方案4
0 2016-04-13 17:21:37

解决方案5
0 已采纳 2016-04-13 18:17:54

Regex to the rescue! 正则表达式可以解救！

Python Code Python代码

Demos 演示

从字符串中删除多个字符序列

问题描述

5 个解决方案

解决方案1 1 2016-04-13 17:21:46

解决方案2 1 2016-04-13 17:23:51

解决方案3 0 2016-04-13 17:15:27

解决方案4 0 2016-04-13 17:21:37

解决方案5 0 已采纳 2016-04-13 18:17:54

Regex to the rescue! 正则表达式可以解救！

Python Code Python代码

Demos 演示

解决方案1
1 2016-04-13 17:21:46

解决方案2
1 2016-04-13 17:23:51

解决方案3
0 2016-04-13 17:15:27

解决方案4
0 2016-04-13 17:21:37

解决方案5
0 已采纳 2016-04-13 18:17:54