简体   繁体   中英

Python regular expression how to deal with multiple back slash \

I'm dealing with text data and having problem erasing multiple back slashes. I found out that using.sub works quite well. So I coded as below to erase back slash+rntfv

temp_string = re.sub(r"[\t\n\r\f\v]"," ",string)

However, the code above can't deal with the string below.

string = '\\\\r \\\\nLove the filtered water and crushed ice in the door.'

So coded as this:

temp_string = re.sub(r"[\\\\t\\\\n\\\\r\\\\f\\\\v]"," ",string)
temp_string

But it's showing result like this..

I don't know why this happens.

Erasing all the v,f,n and so on..

I found out using .replace(“\\\\r”,” ”) works, However,in this way. i should go like..

.replace(“\\\\r”,” ”)

.replace(“\\\r”,” ”)

.replace(“\\r”,” ”)

.replace(“\r”,” ”)

.replace(“\\\\t”,” ”)

…

I'm pretty sure there'd be better way..

You can't define a sequence of characters inside a character class . Character classes are meant to match a single character. So, [\\\\t\\\\n\\\\r\\\\f\\\\v] is equal to [\\tnrfv] and matches either a backslash, or t , n , r , f or v letters.

To match a sequence of chars, you need to use them one by one. To match a \n two-char string you need to use \\n pattern ( r'\\n' ). If you need to match either \n or \v texts you would need to use either \\n|\\v , (?:\\n|\\v) or better \\[nv] .

So, if you want to match a backslash followed with a letter from the rtnfv char set, or "\t" (TAB), "\n" (line feed), "\r" (carriage return), "\f" (form feed) or "\v" (vertical tab) chars you can use

r'\\[rtnfv]|[\t\n\r\f\v]'
r'(?:\\[rtnfv]|[\t\n\r\f\v])'
r'(?:\\[rtnfv]|[\t\n\r\f\v])+'

The last one matches one or more consecutive occurrences of the patterns that may be mixed with each other.

Since escape characters are not the same as characters with a backslash before them, you will need to define a mapping for the escape characters you want to replace.

string = '\\\\r \\\\\nLove the \nfiltered \\twater \\and crushed ice in the door.'

esc_map = {'\\n': '\n',
           '\\t': '\t',
           '\\r': '\r'}

# replace characters that should be escaped characters
for key, value in esc_map.items():
    string = string.replace(key, value)

# group escape character that might have backslashes prefixed 
re_str = r'\\*({})'.format(r'|'.join(esc_map.values()))
# remove extra backslashes
string = re.sub(re_str,r'\1',string)
# replace an escape character with a space
string = re.sub(re_str,r' ',string)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM