How can I compress repetitive characters to a single character using RE in Python?

Question

I want to be able to replace any consecutive occurrences of punctuation characters in a string with a single occurrence. For example:

"I went to the park...." => "I went to the park."
"Are you serious??!!???!" => "Are you serious?!?!"

The first thing that came to mind was to:

for char in string.punctuation:
  text = re.sub( "\\" + char + "+",  char,  text )

However, since this is going to run in a repetitive process, I was wondering if there is a way to achieve this in a single RE, in order to make it run faster. What do you think?

Answer 1

You could try:

text = re.sub(r"([" + re.escape(string.punctuation) + r"])\1+", r"\1", text)

This uses re.escape() to ensure that the punctuation characters are properly escaped as necessary. The \\1 backreferences refer to the part within the parentheses () , which is the first punctuation character matched. So this replaces instances of two or more repeated punctuation characters with the same single character.

Answer 2

re.sub(r'([!?.])\\1+', r'\\1', text)

How can I compress repetitive characters to a single character using RE in Python?

Question

2 answers

solution1
4 ACCPTED 2010-12-19 21:26:39

solution2
3 2010-12-19 21:25:32

How can I compress repetitive characters to a single character using RE in Python?

Question

2 answers

solution1 4 ACCPTED 2010-12-19 21:26:39

solution2 3 2010-12-19 21:25:32

solution1
4 ACCPTED 2010-12-19 21:26:39

solution2
3 2010-12-19 21:25:32