python正则表达式去除重复单词

Question

I am very new a Python我是 Python 新手

I want to change sentence if there are repeated words.如果有重复的单词，我想改变句子。

Correct正确

Ex.前任。 "this just so so so nice" --> "this is just so nice" “这真是太好了”-->“这真是太好了”
Ex.前任。 "this is just is is" --> "this is just is" “这就是就是”-->“这就是”

Right now am I using this reg.现在我正在使用这个 reg。 but it do all so change on letters.但它确实在字母上发生了变化。 Ex.前任。 "My friend and i is happy" --> "My friend and is happy" (it remove the "i" and space) ERROR “我的朋友和我很高兴”-->“我的朋友和我很高兴”（它删除了“我”和空格）错误

text = re.sub(r'(\w+)\1', r'\1', text) #remove duplicated words in row

How can I do the same change but instead of letters it have to check on words?我怎样才能做同样的改变，但它必须检查单词而不是字母？

Answer 1

Non- regex solution using itertools.groupby :使用itertools.groupby非正则表达式解决方案：

>>> strs = "this is just is is"
>>> from itertools import groupby
>>> " ".join([k for k,v in groupby(strs.split())])
'this is just is'
>>> strs = "this just so so so nice" 
>>> " ".join([k for k,v in groupby(strs.split())])
'this just so nice'

Answer 2

text = re.sub(r'\b(\w+)( \1\b)+', r'\1', text) #remove duplicated words in row

\\b匹配空字符串，但仅在单词的开头或结尾。

Answer 3

\\b: Matches Word Boundaries \\b：匹配词边界
\\w: Any word character \\w：任意单词字符

\\1: Replaces the matches with the second word found \\1：用找到的第二个单词替换匹配项

 import re def Remove_Duplicates(Test_string): Pattern = r"\\b(\\w+)(?:\\W\\1\\b)+" return re.sub(Pattern, r"\\1", Test_string, flags=re.IGNORECASE) Test_string1 = "Good bye bye world world" Test_string2 = "Ram went went to to his home" Test_string3 = "Hello hello world world" print(Remove_Duplicates(Test_string1)) print(Remove_Duplicates(Test_string2)) print(Remove_Duplicates(Test_string3))

Result:结果：

    Good bye world
    Ram went to his home
    Hello world

python正则表达式去除重复单词

问题描述

3 个解决方案

解决方案1
9 2013-06-21 15:10:34

解决方案2
6 已采纳 2013-06-21 15:15:18

解决方案3
0 2021-02-17 19:22:46

python正则表达式去除重复单词

问题描述

3 个解决方案

解决方案1 9 2013-06-21 15:10:34

解决方案2 6 已采纳 2013-06-21 15:15:18

解决方案3 0 2021-02-17 19:22:46

解决方案1
9 2013-06-21 15:10:34

解决方案2
6 已采纳 2013-06-21 15:15:18

解决方案3
0 2021-02-17 19:22:46