简体   繁体   English

Python正则表达式替换不能像我期望的那样工作

[英]Python regular expression replacement doesn't work as I expect

I am trying to create a regular expression to replace part of a string. 我正在尝试创建一个正则表达式来替换字符串的一部分。 This is an example of the string: 这是字符串的示例:

string = u'/nl/nl/1681/1/0/a/all/'
pattern = r'(/\w{2}/\w{2}/)(\d+)/(\d+)(/\d+/[ans]/all/)'
pattern_obj = re.compile(pattern)

The pattern specifies 4 groups. 该模式指定了4个组。 If you do a search then the results are as follows: 如果您进行搜索,则结果如下:

m = pattern_obj.search(string)
m.group(0) -> u'/nl/nl/1681/1/0/a/all/'
m.group(1) -> u'/nl/nl/'
m.group(2) -> u'1681'
m.group(2) -> u'1'
m.group(4) -> u'/0/a/all/'

So far so good. 到现在为止还挺好。 Now I specify a replacement string as follows: 现在我指定一个替换字符串,如下所示:

replacement = r'\1' + '1000' + '/' + '20' + r'\4'

and issue the following statement: 并发出以下声明:

pattern_obj.sub(replacement,string)

and this results in: 这导致:

u'H00/20/0/a/all/'

I expected this: 我期待这个:

u'/nl/nl/1000/20/0/a/all/'

I must be doing something wrong but I don't know what. 我一定做错了,但我不知道是什么。 Can anybody help me out? 有人可以帮帮我吗?

Your replacement string, when it's fully assembled, is \\11000/20\\4 and \\110 gets interpreted as the octal escape for H rather than a back-reference to group number 1 followed by 10 . 完全组装时,替换字符串为\\11000/20\\4\\110被解释为H的八进制转义,而不是对组号1后跟10的后引用。

You need to write \\g<1> instead of \\1 to make sure that it's unambiguously a back-reference. 你需要写\\g<1>而不是\\1来确保它明确地是一个反向引用。 See the documentation for re.sub . 请参阅re.sub的文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM