[英]regex for repeating words in a string to repalce one in Python
How can I replace repeating words in a string, with just one copy? 如何仅用一个副本替换字符串中的重复单词?
For example: 例如:
hi hi hello hello hello bye bye bye bye
should become: 应该变成:
hi hello bye
My code : 我的代码:
import re
s = "hi hi hello hello hello bye bye bye bye"
m=re.sub(r'(?<!\S)((\S+)(?:\s+\2))(?:\s+\2)+(?!\S)', r'\2', s)
print m
output: 输出:
hi hi hello bye
You can use: 您可以使用:
re.sub(r'\b(\S+)(?: \1)+\b', r'\1', s)
The \\b
escape is a zero-width match for a word break (either whitespace or the start or end of the text). \\b
转义符是一个零宽度的匹配符,用于断字(空格或文本的开头或结尾)。 Using it lets the rest of the pattern work without stuff like goodbye bye
or foo foobar
getting trimmed incorrectly. 使用它可以使模式的其余部分正常工作,而不会像goodbye bye
或foo foobar
被错误地修剪。
The inner part of the pattern matches a word followed by one or more repeats of the same word separated by spaces. 模式的内部与一个单词匹配,后跟一个或多个相同单词的重复,并用空格分隔。 The whole thing is replaced by one copy of the word. 整个单词被单词的一个副本代替。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.