简体   繁体   English

在re.sub中创建一个子组引用(\\ g <1>)可选

[英]Make a subgroup reference (\g<1>) optional in re.sub

How can I make a subgroup reference ( \\g<1> ) optional in re.sub() ? 如何在re.sub()创建子组引用( \\g<1> )可选? For example with: 例如:

import re

regexp = re.compile(r'^http://(lists\.|www\.)?example\.com/')
regexp.sub(
    r'https://\g<1>example.com/',
    r'http://example.com/helllo-there'
)

I would like \\g<1> to be replaced with nothing, the optional subgroup isn't matched (and not raise an exception). 我希望\\g<1>替换为空,可选子组不匹配(并且不引发异常)。

I know I can use regexp.match(..).groups() to check which groups are present, but this seems like a lot of work to me (we would need a bunch of replacement patterns, since some examples go up to \\g<6> ). 我知道我可以使用regexp.match(..).groups()来检查哪些组存在,但这对我来说似乎很多工作(我们需要一堆替换模式,因为一些例子上升到\\g<6> )。 It's also not very fast since we need to do a match and a replace . 它也不是很快,因为我们需要进行match replace

For example in JavaScript, I can use $1 , if it's not matched it's just ignored: 例如在JavaScript中,我可以使用$1 ,如果它不匹配,它只是被忽略:

'http://example.com/helllo-there'.replace(
    RegExp('^http://(lists\.|www\.)?example\.com/'),
    'https://$1example.com/')
// Outputs: "https://example.com/helllo-there"

Another option is to provide an explicit empty alternative: 另一种选择是提供一个明确的空替代方案:

 regexp = re.compile(r'^http://(lists\.|www\.|)example\.com/')

Also, you can use just \\1 instead of \\g<1> . 此外,您只能使用\\1而不是\\g<1>

如果我理解正确,只需做x(y?)z而不是x(y)?z

I would do like this. 我会这样做的。 Just put the pattern inside a non-capturing group and make it as optional. 只需将模式放在非捕获组中,并将其设置为可选。 Now include that optional non-capturing group inside a capturing group. 现在在捕获组中包含可选的非捕获组。

>>> re.sub(r'^http://((?:lists\.|www\.)?)example\.com/',r'https://\g<1>example.com/', 'http://example.com/helllo-there')
'https://example.com/helllo-there'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM