python正則表達式去除重復單詞

Question

我是 Python 新手

如果有重復的單詞，我想改變句子。

正確

前任。 “這真是太好了”-->“這真是太好了”
前任。 “這就是就是”-->“這就是”

現在我正在使用這個 reg。 但它確實在字母上發生了變化。 前任。 “我的朋友和我很高興”-->“我的朋友和我很高興”（它刪除了“我”和空格）錯誤

text = re.sub(r'(\w+)\1', r'\1', text) #remove duplicated words in row

我怎樣才能做同樣的改變，但它必須檢查單詞而不是字母？

Answer 1

使用itertools.groupby非正則表達式解決方案：

>>> strs = "this is just is is"
>>> from itertools import groupby
>>> " ".join([k for k,v in groupby(strs.split())])
'this is just is'
>>> strs = "this just so so so nice" 
>>> " ".join([k for k,v in groupby(strs.split())])
'this just so nice'

Answer 2

text = re.sub(r'\b(\w+)( \1\b)+', r'\1', text) #remove duplicated words in row

\\b匹配空字符串，但僅在單詞的開頭或結尾。

Answer 3

\\b：匹配詞邊界
\\w：任意單詞字符

\\1：用找到的第二個單詞替換匹配項

 import re def Remove_Duplicates(Test_string): Pattern = r"\\b(\\w+)(?:\\W\\1\\b)+" return re.sub(Pattern, r"\\1", Test_string, flags=re.IGNORECASE) Test_string1 = "Good bye bye world world" Test_string2 = "Ram went went to to his home" Test_string3 = "Hello hello world world" print(Remove_Duplicates(Test_string1)) print(Remove_Duplicates(Test_string2)) print(Remove_Duplicates(Test_string3))

結果：

    Good bye world
    Ram went to his home
    Hello world

python正則表達式去除重復單詞

問題描述

3 個解決方案

解決方案1
9 2013-06-21 15:10:34

解決方案2
6 已采納 2013-06-21 15:15:18

解決方案3
0 2021-02-17 19:22:46

python正則表達式去除重復單詞

問題描述

3 個解決方案

解決方案1 9 2013-06-21 15:10:34

解決方案2 6 已采納 2013-06-21 15:15:18

解決方案3 0 2021-02-17 19:22:46

解決方案1
9 2013-06-21 15:10:34

解決方案2
6 已采納 2013-06-21 15:15:18

解決方案3
0 2021-02-17 19:22:46