Python regular expression replacement doesn't work as I expect

Question

I am trying to create a regular expression to replace part of a string. This is an example of the string:

string = u'/nl/nl/1681/1/0/a/all/'
pattern = r'(/\w{2}/\w{2}/)(\d+)/(\d+)(/\d+/[ans]/all/)'
pattern_obj = re.compile(pattern)

The pattern specifies 4 groups. If you do a search then the results are as follows:

m = pattern_obj.search(string)
m.group(0) -> u'/nl/nl/1681/1/0/a/all/'
m.group(1) -> u'/nl/nl/'
m.group(2) -> u'1681'
m.group(2) -> u'1'
m.group(4) -> u'/0/a/all/'

So far so good. Now I specify a replacement string as follows:

replacement = r'\1' + '1000' + '/' + '20' + r'\4'

and issue the following statement:

pattern_obj.sub(replacement,string)

and this results in:

u'H00/20/0/a/all/'

I expected this:

u'/nl/nl/1000/20/0/a/all/'

I must be doing something wrong but I don't know what. Can anybody help me out?

Answer 1

Your replacement string, when it's fully assembled, is \\11000/20\\4 and \\110 gets interpreted as the octal escape for H rather than a back-reference to group number 1 followed by 10 .

You need to write \\g<1> instead of \\1 to make sure that it's unambiguously a back-reference. See the documentation for re.sub .

Python regular expression replacement doesn't work as I expect

Question

1 answers

solution1
3 2012-09-13 16:34:54

Python regular expression replacement doesn't work as I expect

Question

1 answers

solution1 3 2012-09-13 16:34:54

solution1
3 2012-09-13 16:34:54