[英]python regular expression to remove repeated words
我是 Python 新手
如果有重复的单词,我想改变句子。
正确
现在我正在使用这个 reg。 但它确实在字母上发生了变化。 前任。 “我的朋友和我很高兴”-->“我的朋友和我很高兴”(它删除了“我”和空格)错误
text = re.sub(r'(\w+)\1', r'\1', text) #remove duplicated words in row
我怎样才能做同样的改变,但它必须检查单词而不是字母?
使用itertools.groupby
非正则表达式解决方案:
>>> strs = "this is just is is"
>>> from itertools import groupby
>>> " ".join([k for k,v in groupby(strs.split())])
'this is just is'
>>> strs = "this just so so so nice"
>>> " ".join([k for k,v in groupby(strs.split())])
'this just so nice'
text = re.sub(r'\b(\w+)( \1\b)+', r'\1', text) #remove duplicated words in row
\\b
匹配空字符串,但仅在单词的开头或结尾。
\\b:匹配词边界
\\w:任意单词字符
\\1:用找到的第二个单词替换匹配项
import re def Remove_Duplicates(Test_string): Pattern = r"\\b(\\w+)(?:\\W\\1\\b)+" return re.sub(Pattern, r"\\1", Test_string, flags=re.IGNORECASE) Test_string1 = "Good bye bye world world" Test_string2 = "Ram went went to to his home" Test_string3 = "Hello hello world world" print(Remove_Duplicates(Test_string1)) print(Remove_Duplicates(Test_string2)) print(Remove_Duplicates(Test_string3))
结果:
Good bye world
Ram went to his home
Hello world
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.