简体   繁体   中英

Python regular expression how to deal with multiple back slash \

I'm dealing with text data and having problem erasing multiple back slashes. I found out that using.sub works quite well. So I coded as below to erase back slash+rntfv

temp_string = re.sub(r"[\t\n\r\f\v]"," ",string)

However, the code above can't deal with the string below.

string = '\\\\r \\\\nLove the filtered water and crushed ice in the door.'

So coded as this:

temp_string = re.sub(r"[\\\\t\\\\n\\\\r\\\\f\\\\v]"," ",string)

But it's showing result like this..

I don't know why this happens.

Erasing all the v,f,n and so on..

I found out using .replace(“\\\\r”,” ”) works, However,in this way. i should go like..

.replace(“\\\\r”,” ”)

.replace(“\\\r”,” ”)

.replace(“\\r”,” ”)

.replace(“\r”,” ”)

.replace(“\\\\t”,” ”)


I'm pretty sure there'd be better way..

You can't define a sequence of characters inside a character class . Character classes are meant to match a single character. So, [\\\\t\\\\n\\\\r\\\\f\\\\v] is equal to [\\tnrfv] and matches either a backslash, or t , n , r , f or v letters.

To match a sequence of chars, you need to use them one by one. To match a \n two-char string you need to use \\n pattern ( r'\\n' ). If you need to match either \n or \v texts you would need to use either \\n|\\v , (?:\\n|\\v) or better \\[nv] .

So, if you want to match a backslash followed with a letter from the rtnfv char set, or "\t" (TAB), "\n" (line feed), "\r" (carriage return), "\f" (form feed) or "\v" (vertical tab) chars you can use


The last one matches one or more consecutive occurrences of the patterns that may be mixed with each other.

Since escape characters are not the same as characters with a backslash before them, you will need to define a mapping for the escape characters you want to replace.

string = '\\\\r \\\\\nLove the \nfiltered \\twater \\and crushed ice in the door.'

esc_map = {'\\n': '\n',
           '\\t': '\t',
           '\\r': '\r'}

# replace characters that should be escaped characters
for key, value in esc_map.items():
    string = string.replace(key, value)

# group escape character that might have backslashes prefixed 
re_str = r'\\*({})'.format(r'|'.join(esc_map.values()))
# remove extra backslashes
string = re.sub(re_str,r'\1',string)
# replace an escape character with a space
string = re.sub(re_str,r' ',string)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM