[英]Python regular expression replacement doesn't work as I expect
I am trying to create a regular expression to replace part of a string. 我正在尝试创建一个正则表达式来替换字符串的一部分。 This is an example of the string:
这是字符串的示例:
string = u'/nl/nl/1681/1/0/a/all/'
pattern = r'(/\w{2}/\w{2}/)(\d+)/(\d+)(/\d+/[ans]/all/)'
pattern_obj = re.compile(pattern)
The pattern specifies 4 groups. 该模式指定了4个组。 If you do a search then the results are as follows:
如果您进行搜索,则结果如下:
m = pattern_obj.search(string)
m.group(0) -> u'/nl/nl/1681/1/0/a/all/'
m.group(1) -> u'/nl/nl/'
m.group(2) -> u'1681'
m.group(2) -> u'1'
m.group(4) -> u'/0/a/all/'
So far so good. 到现在为止还挺好。 Now I specify a replacement string as follows:
现在我指定一个替换字符串,如下所示:
replacement = r'\1' + '1000' + '/' + '20' + r'\4'
and issue the following statement: 并发出以下声明:
pattern_obj.sub(replacement,string)
and this results in: 这导致:
u'H00/20/0/a/all/'
I expected this: 我期待这个:
u'/nl/nl/1000/20/0/a/all/'
I must be doing something wrong but I don't know what. 我一定做错了,但我不知道是什么。 Can anybody help me out?
有人可以帮帮我吗?
Your replacement string, when it's fully assembled, is \\11000/20\\4
and \\110
gets interpreted as the octal escape for H
rather than a back-reference to group number 1 followed by 10
. 完全组装时,替换字符串为
\\11000/20\\4
, \\110
被解释为H
的八进制转义,而不是对组号1后跟10
的后引用。
You need to write \\g<1>
instead of \\1
to make sure that it's unambiguously a back-reference. 你需要写
\\g<1>
而不是\\1
来确保它明确地是一个反向引用。 See the documentation for re.sub
. 请参阅
re.sub
的文档 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.