简体   繁体   English

正则表达式:匹配连续的标点符号并替换为第一个

[英]Regex: match consecutive punctuation marks and replace by the first

I am trying to remove some predefined consecutive punctuation marks and replace them with the first. 我正在尝试删除一些预定义的连续标点符号,并将其替换为第一个。 Thus: 从而:

  1. us, -> us 我们->我们
  2. us -> us 我们->我们
  3. us! 我们! -> us ->我们
  4. hiiii!!!, -> hiiii! hiiii !!!,-> hiiii!

I tried the following code: 我尝试了以下代码:

import re
r = re.compile(r'([.,/#!$%^&*;:{}=-_`~()])*\1')
n = r.sub(r'\1', "ews by almalki : Tornado, flood deaths reach 18 in U.s., more storms ahead ")
print(n)

You just need to capture the first punctuation mark and match the rest: 您只需要捕获第一个标点符号并匹配其余的:

([.,/#!$%^&*;:{}=_`~()-])[.,/#!$%^&*;:{}=_`~()-]+

See the regex demo 正则表达式演示

Note that the - must be put at the end (or start) of the character class in order not to create a range (or it can be escaped inside the character class). 请注意, -必须放在字符类的末尾(或开始),以免创建范围(否则可以在字符类内部转义)。

Details : 详细资料

  • ([.,/#!$%^&*;:{}=_`~()-]) - capturing group with the punctuation symbols you defined ([.,/#!$%^&*;:{}=_`~()-]) -使用您定义的标点符号捕获组
  • [.,/#!$%^&*;:{}=_`~()-]+ - 1+ punctuation symbols [.,/#!$%^&*;:{}=_`~()-]+ -1+个标点符号

Python demo : Python演示

import re
r = re.compile(r'([.,/#!$%^&*;:{}=_`~()-])[.,/#!$%^&*;:{}=_`~()-]+')
n = r.sub(r'\1', "ews by almalki : Tornado, flood deaths reach 18 in U.s., more storms ahead ")
print(n)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM