简体   繁体   中英

How can I use re.sub with IGNORECASE in Python 2.7?

I want to modify a string with the help of re.sub :

>>> re.sub("sparta", r"<b>\1</b>", "Here is Sparta.", flags=re.IGNORECASE)

I expect to get:

'Here is <b>Sparta</b>.'

But I get an error instead:

>>> re.sub("sparta", r"<b>\1</b>", "Here is Sparta.", flags=re.IGNORECASE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/re.py", line 155, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/usr/lib/python2.7/re.py", line 291, in filter
    return sre_parse.expand_template(template, match)
  File "/usr/lib/python2.7/sre_parse.py", line 833, in expand_template
    raise error, "invalid group reference"
sre_constants.error: invalid group reference

How should I use re.sub to get the correct result?

You do not specify any capturing group in the pattern and use a backreference to Group 1 in the replacement pattern. That causes an issue.

Either define a capturing group in the pattern and use the appropriate backreference in the replacement pattern, or use the \\g<0> backreference to the whole match:

re.sub("sparta", r"<b>\g<0></b>", "Here is Sparta.", flags=re.IGNORECASE)

See the Python demo .

When you use \\x in your second string (replacement string I think it's called) where x is a number, python is going to replace it with the group x .

You can define a group in your regex by wrapping it with parentheses, like so:

re.sub(r"capture (['me]{2})", r'group 1: \1', 'capture me!') # => group 1: me
re.sub(r"capture (['me]{2})", r'group 1: \1', "capture 'em!") # => group 1: 'em

Nested captures? I've lost the count!

It's the opening bracket that defines it's number:

(this is the first group (this is the second) (this is the third))

Named group

Named group are pretty useful when you use the match object that returns re.match or re.search for example (refer to the docs for more), but also when you use complex regex, because they bring clarity .

You can name a group with the following syntax:

(?P<your_group_name>your pattern)

So, for example:

re.sub("(?P<first>hello(?P<second>[test]+)) (?P<third>[a-z])", "first: \g<first>") # => first: hello

What is the group 0

The group 0 is the entire match. But, you can't use \\0 , because it's going to print out \\x00 (the actual value of this escaped code). The solution is to use the named group syntax (because regular group are kind of named group: they're name is just an integer): \\g<0> . So, for example:

re.sub(r'[hello]+', r'\g<0>', 'lehleo') # => lehleo

For your problem

This answer is just suppose to explain capturing, not really answering your question, since @Wiktor Stribiżew's one is perfect.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM