简体   繁体   中英

Python regular expression replacement doesn't work as I expect

I am trying to create a regular expression to replace part of a string. This is an example of the string:

string = u'/nl/nl/1681/1/0/a/all/'
pattern = r'(/\w{2}/\w{2}/)(\d+)/(\d+)(/\d+/[ans]/all/)'
pattern_obj = re.compile(pattern)

The pattern specifies 4 groups. If you do a search then the results are as follows:

m = pattern_obj.search(string)
m.group(0) -> u'/nl/nl/1681/1/0/a/all/'
m.group(1) -> u'/nl/nl/'
m.group(2) -> u'1681'
m.group(2) -> u'1'
m.group(4) -> u'/0/a/all/'

So far so good. Now I specify a replacement string as follows:

replacement = r'\1' + '1000' + '/' + '20' + r'\4'

and issue the following statement:

pattern_obj.sub(replacement,string)

and this results in:

u'H00/20/0/a/all/'

I expected this:

u'/nl/nl/1000/20/0/a/all/'

I must be doing something wrong but I don't know what. Can anybody help me out?

Your replacement string, when it's fully assembled, is \\11000/20\\4 and \\110 gets interpreted as the octal escape for H rather than a back-reference to group number 1 followed by 10 .

You need to write \\g<1> instead of \\1 to make sure that it's unambiguously a back-reference. See the documentation for re.sub .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM