I want to be able to replace any consecutive occurrences of punctuation characters in a string with a single occurrence. For example:
The first thing that came to mind was to:
for char in string.punctuation:
text = re.sub( "\\" + char + "+", char, text )
However, since this is going to run in a repetitive process, I was wondering if there is a way to achieve this in a single RE, in order to make it run faster. What do you think?
You could try:
text = re.sub(r"([" + re.escape(string.punctuation) + r"])\1+", r"\1", text)
This uses re.escape()
to ensure that the punctuation characters are properly escaped as necessary. The \\1
backreferences refer to the part within the parentheses ()
, which is the first punctuation character matched. So this replaces instances of two or more repeated punctuation characters with the same single character.
re.sub(r'([!?.])\\1+', r'\\1', text)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.