简体   繁体   English

re.sub repl函数返回\\ 1不会替换组

[英]re.sub repl function returning \1 does not replace the group

I am trying to write a generic replace function for a regex sub operation in Python (trying in both 2 and 3) Where the user can provide a regex pattern and a replacement for the match. 我正在尝试为Python中的regex子操作编写通用的替换函数(在2和3中都尝试),用户可以在其中提供regex模式和匹配项的替换。 This could be just a simple string replacement to replacing using the groups from the match. 这可能只是使用匹配项中的组进行替换的简单字符串替换。

In the end, I get from the user a dictionary in this form: 最后,我从用户那里获得了以下形式的字典:

regex_dict = {pattern:replacement}

When I try to replace all the occurrences of a pattern via this command, the replacement works for replacements for a group number, (such as \\1) and I call the following operation: 当我尝试通过此命令替换所有出现的模式时,替换适用于替换组号(例如\\ 1),并调用以下操作:

re.sub(pattern, regex_dict[pattern], text)

This works as expected, but I need to do additional stuff when a match is found. 这按预期工作,但是当找到匹配项时,我需要做其他工作。 Basically, what I try to achieve is as follows: 基本上,我尝试实现的目标如下:

replace_function(matchobj):
    result = regex_dict[matchobj.re]
    ##
    ## Do some other things
    ##
    return result

re.sub(pattern, replace_function, text)

I see that this works for normal replacements, but the re.sub does not use the group information to get the match when the function is used. 我看到这适用于常规替换,但是使用该功能时re.sub不会使用组信息来获取匹配项。

I also tried to convert the \\1 pattern to \\g<1>, hoping that the re.sub would understand it, but to no avail. 我还尝试将\\ 1模式转换为\\ g <1>,希望re.sub能够理解它,但无济于事。

Am I missing something vital? 我缺少重要的东西吗?

Thanks in advance! 提前致谢!

Additional notes: I compile the pattern using strings as in bytes, and the replacements are also in bytes. 附加说明:我使用字符串(以字节为单位)编译模式,替换内容也以字节为单位。 I have non-Latin characters in my pattern, but I read everything in bytes, including the text where the regex substitution will operate on. 我的模式中有非拉丁字符,但我读取的所有内容均以字节为单位,包括将在正则表达式替换上进行操作的文本。

EDIT Just to clarify, I do not know in advance what kind of replacement the user will provide. 编辑只是为了澄清,我不预先知道用户将提供什么样的替换。 It could be some combination of normal strings and groups, or just a string replacement. 它可以是普通字符串和组的某种组合,也可以只是字符串替换。

SOLUTION

replace_function(matchobj):
    repl = regex_dict[matchobj.re]
    ##
    ## Do some other things
    ##
    return matchobj.expand(repl)

re.sub(pattern, replace_function, text)

I suspect you're after .expand , if you've got a compiled regex object (for instance), you can provide a string to be taken into consideration for the replacements, eg: 我怀疑您是在.expand之后,如果您有一个已编译的regex对象(例如),则可以提供一个替换字符串,例如:

import re

text = 'abc'
# This would be your key in the dict
rx = re.compile('a(\w)c') 
# This would be the value for the key (the replacement string, eg: `\1\1\1`)
res = rx.match(text).expand(r'\1\1\1') 
# bbb

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM