How can I make a subgroup reference ( \\g<1>
) optional in re.sub()
? For example with:
import re
regexp = re.compile(r'^http://(lists\.|www\.)?example\.com/')
regexp.sub(
r'https://\g<1>example.com/',
r'http://example.com/helllo-there'
)
I would like \\g<1>
to be replaced with nothing, the optional subgroup isn't matched (and not raise an exception).
I know I can use regexp.match(..).groups()
to check which groups are present, but this seems like a lot of work to me (we would need a bunch of replacement patterns, since some examples go up to \\g<6>
). It's also not very fast since we need to do a match
and a replace
.
For example in JavaScript, I can use $1
, if it's not matched it's just ignored:
'http://example.com/helllo-there'.replace(
RegExp('^http://(lists\.|www\.)?example\.com/'),
'https://$1example.com/')
// Outputs: "https://example.com/helllo-there"
Another option is to provide an explicit empty alternative:
regexp = re.compile(r'^http://(lists\.|www\.|)example\.com/')
Also, you can use just \\1
instead of \\g<1>
.
如果我理解正确,只需做x(y?)z
而不是x(y)?z
I would do like this. Just put the pattern inside a non-capturing group and make it as optional. Now include that optional non-capturing group inside a capturing group.
>>> re.sub(r'^http://((?:lists\.|www\.)?)example\.com/',r'https://\g<1>example.com/', 'http://example.com/helllo-there')
'https://example.com/helllo-there'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.