I want to modify a string with the help of re.sub
:
>>> re.sub("sparta", r"<b>\1</b>", "Here is Sparta.", flags=re.IGNORECASE)
I expect to get:
'Here is <b>Sparta</b>.'
But I get an error instead:
>>> re.sub("sparta", r"<b>\1</b>", "Here is Sparta.", flags=re.IGNORECASE)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/re.py", line 155, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/usr/lib/python2.7/re.py", line 291, in filter
return sre_parse.expand_template(template, match)
File "/usr/lib/python2.7/sre_parse.py", line 833, in expand_template
raise error, "invalid group reference"
sre_constants.error: invalid group reference
How should I use re.sub
to get the correct result?
You do not specify any capturing group in the pattern and use a backreference to Group 1 in the replacement pattern. That causes an issue.
Either define a capturing group in the pattern and use the appropriate backreference in the replacement pattern, or use the \\g<0>
backreference to the whole match:
re.sub("sparta", r"<b>\g<0></b>", "Here is Sparta.", flags=re.IGNORECASE)
See the Python demo .
When you use \\x
in your second string (replacement string I think it's called) where x
is a number, python is going to replace it with the group x
.
You can define a group in your regex by wrapping it with parentheses, like so:
re.sub(r"capture (['me]{2})", r'group 1: \1', 'capture me!') # => group 1: me
re.sub(r"capture (['me]{2})", r'group 1: \1', "capture 'em!") # => group 1: 'em
Nested captures? I've lost the count!
It's the opening bracket that defines it's number:
(this is the first group (this is the second) (this is the third))
Named group are pretty useful when you use the match object that returns re.match
or re.search
for example (refer to the docs for more), but also when you use complex regex, because they bring clarity .
You can name a group with the following syntax:
(?P<your_group_name>your pattern)
So, for example:
re.sub("(?P<first>hello(?P<second>[test]+)) (?P<third>[a-z])", "first: \g<first>") # => first: hello
0
The group 0
is the entire match. But, you can't use \\0
, because it's going to print out \\x00
(the actual value of this escaped code). The solution is to use the named group syntax (because regular group are kind of named group: they're name is just an integer): \\g<0>
. So, for example:
re.sub(r'[hello]+', r'\g<0>', 'lehleo') # => lehleo
This answer is just suppose to explain capturing, not really answering your question, since @Wiktor Stribiżew's one is perfect.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.