简体   繁体   中英

Python unescaping string in regex replacements

The output of the code below:

rpl = 'This is a nicely escaped newline \\n'
my_string = 'I hope this apple is replaced with a nicely escaped string'
reg = re.compile('apple')
reg.sub( rpl, my_string )

..is:

'I hope this This is a nicely escaped newline \n is replaced with a nicely escaped string'

..so when printed:

I hope this This is a nicely escaped newline

is replaced with a nicely escaped string

So python is unescaping the string when it replaces 'apple' in the other string? For now I've just done

reg.sub( rpl.replace('\\','\\\\'), my_string )

Is this safe? Is there a way to stop Python from doing that?

From help(re.sub) [emphasis mine]:

sub(pattern, repl, string, count=0, flags=0)

Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a string, backslash escapes in it are processed. If it is a callable, it's passed the match object and must return a replacement string to be used.

One way to get around this is to pass a lambda :

>>> reg.sub(rpl, my_string )
'I hope this This is a nicely escaped newline \n is replaced with a nicely escaped string'
>>> reg.sub(lambda x: rpl, my_string )
'I hope this This is a nicely escaped newline \\n is replaced with a nicely escaped string'

All regex patterns used for Python's re module are unescaped, including both search and replacement patterns. This is why the r modifier is generally used with regex patterns in Python, as it reduces the amount of "backwhacking" necessary to write usable patterns.

The r modifier appears before a string constant and basically makes all \\ characters (except those before string delimiters) verbatim. So, r'\\\\' == '\\\\\\\\' , and r'\\n' == '\\\\n' .

Writing your example as

rpl = r'This is a nicely escaped newline \\n'
my_string = 'I hope this apple is replaced with a nicely escaped string'
reg = re.compile(r'apple')
reg.sub( rpl, my_string )

works as expected.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM