简体   繁体   English

Python Regex - 替换连续的标点符号

[英]Python Regex - replacing consecutive punctuation

I have a bunch of online reviews, and I would like to replace things such as " ,,, or !! with , and ! , respectively, for example. I think my question boils down twofold:我有一大堆的在线评论,我想更换的东西,如“ ,,,!!,! ,分别,例如我想我的问题归结双重的:

  1. How do I compile a regex pattern to detect two+ back-to-back punctuation?如何编译正则表达式模式以检测两个以上的背靠背标点符号?
  2. Upon some method such as replace, how do I retrieve which particular pattern was detected?在某些方法(例如替换)下,如何检索检测到的特定模式? So that I could use that pattern to replace its consecutiveness.这样我就可以使用该模式来替换其连续性。

As an extension, how about replacing weird punctuation that you may see in Tweets such as !! ! :)作为扩展,如何替换您可能在推文中看到的奇怪标点符号,例如!! ! :) !! ! :) !! ! :) ? !! ! :) ?

Any guidance helps.任何指导都有帮助。

You could use re.sub here:你可以在这里使用re.sub

inp = "Hello World!!!"
output = re.sub(r'([!?,;])\1+', r'\1', inp)
print(inp + "\n" + output)

This prints:这打印:

Hello World!!!
Hello World!

You may expand the character class [!?,;] to include other punctuation characters as you want.您可以根据需要扩展字符类[!?,;]以包含其他标点符号。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM