regex用于在字符串中重复单词以替换Python中的单词

Question

How can I replace repeating words in a string, with just one copy? 如何仅用一个副本替换字符串中的重复单词？

For example: 例如：

hi hi hello hello hello bye bye bye bye

should become: 应该变成：

hi hello bye

My code : 我的代码：

import re
s = "hi hi hello hello hello bye bye bye bye"
m=re.sub(r'(?<!\S)((\S+)(?:\s+\2))(?:\s+\2)+(?!\S)', r'\2', s)
print m

output: 输出：

hi hi hello bye

Answer 1

You can use: 您可以使用：

re.sub(r'\b(\S+)(?: \1)+\b', r'\1', s)

The \\b escape is a zero-width match for a word break (either whitespace or the start or end of the text). \\b转义符是一个零宽度的匹配符，用于断字（空格或文本的开头或结尾）。 Using it lets the rest of the pattern work without stuff like goodbye bye or foo foobar getting trimmed incorrectly. 使用它可以使模式的其余部分正常工作，而不会像goodbye bye或foo foobar被错误地修剪。

The inner part of the pattern matches a word followed by one or more repeats of the same word separated by spaces. 模式的内部与一个单词匹配，后跟一个或多个相同单词的重复，并用空格分隔。 The whole thing is replaced by one copy of the word. 整个单词被单词的一个副本代替。

regex用于在字符串中重复单词以替换Python中的单词

问题描述

1 个解决方案

解决方案1
1 2016-05-13 07:35:23

regex用于在字符串中重复单词以替换Python中的单词

问题描述

1 个解决方案

解决方案1 1 2016-05-13 07:35:23

解决方案1
1 2016-05-13 07:35:23