简体   繁体   中英

How can I compress repetitive characters to a single character using RE in Python?

I want to be able to replace any consecutive occurrences of punctuation characters in a string with a single occurrence. For example:

  • "I went to the park...." => "I went to the park."
  • "Are you serious??!!???!" => "Are you serious?!?!"

The first thing that came to mind was to:

for char in string.punctuation:
  text = re.sub( "\\" + char + "+",  char,  text )

However, since this is going to run in a repetitive process, I was wondering if there is a way to achieve this in a single RE, in order to make it run faster. What do you think?

You could try:

text = re.sub(r"([" + re.escape(string.punctuation) + r"])\1+", r"\1", text)

This uses re.escape() to ensure that the punctuation characters are properly escaped as necessary. The \\1 backreferences refer to the part within the parentheses () , which is the first punctuation character matched. So this replaces instances of two or more repeated punctuation characters with the same single character.

re.sub(r'([!?.])\\1+', r'\\1', text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM