简体   繁体   中英

Regex to replace a list of characters in python

I have a list of characters that I want to find in a string and replace its multiple occurances together into just one occurance.

But I am facing 2 problems - when i loop over them, the re.sub function does not replace the multiple occurances and when i have a smiley like :) it replaces ':' with ':)' which I dont want.

Here is the code that I tried.

end_of_line_chars = [".",";","!",":)",":-)","=)",":]",":-(",":(",":[","=(",":P",":-P",":-p",":p","=P"]
for i in end_of_line_chars:
    pattern = "[" + i + "]" + "+"
    str = re.sub(pattern,i,str)

If I take a single character and try it works as shown below.

str = re.sub("[.]+",".",str)

But looping over a list of characters gives error. How to solve these 2 problems? Thanks for the help.

re.escape(str) does the escaping for you. Separated with | you can match alternatives. With (?:…) you do grouping without capturing. So:

# Only in Python2:
from itertools import imap as map, ifilter as filter

# Escape all elements for, e.g. ':-)' → r'\:\-\)':
esc = map(re.escape, end_of_line_chars)
# Wrap elements in capturing as group, so you know what element what found,
# and in a non-capturing group with repeats and optional trailing spaces:
esc = map(r'(?:({})\s*)+'.format, esc)
# Compile expressing what finds any of these elements:
esc = re.compile('|'.join(esc))

# The function to turn a match of repeats into a single item:
def replace_with_one(match):
    # match.groups() has captures, where only the found one is truthy: ()
    # e.g. (None, None, None, None, ':-)', None, None, None, None, None, None, None, None, None, None, None)
    return next(filter(bool, match.groups()))

# This is how you use it:
esc.sub(replace_with_one, '.... :-) :-) :-) :-( .....')
# Returns: '.:-):-(.'

If the things to replace are not single characters, character classes won't work. Instead, use non-capture groups (and use re.escape so the literals aren't interpreted as regex special characters):

end_of_line_chars = [".",";","!",":)",":-)","=)",":]",":-(",":(",":[","=(",":P",":-P",":-p",":p","=P"]
for i in end_of_line_chars:
    pattern = r"(?:{})+".format(re.escape(i))
    str = re.sub(pattern,i,str)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM