简体   繁体   English

re.sub在python中:详细模式不适用于替换模式吗?

[英]re.sub in python : verbose mode does not work with replacement pattern?

Is there any way to get around this limitation of re.sub? 有什么办法可以解决re.sub的限制吗? It is not fully functional for verbose mode (with back reference here) in the replace pattern; 对于替换模式中的详细模式(此处带有反向引用),它不能完全发挥作用; it does not eliminate whitespace or comments (yet it does interpret backreferences properly). 它不能消除空格或注释(但它确实可以正确解释反向引用)。

import remport re

ft1=r"""(?P<test>[0-9]+)"""
ft2=r"""\g<test>and then: \g<test> #this remains"""

print re.sub(ft1,ft2,"front 1234 back",flags=re.VERBOSE) #Does not work 
#result: front 1234and then: 1234 #this remains back

re.VERBOSE does not apply to the replacement pattern... Is there a work-around? re.VERBOSE不适用于替换模式...是否有解决方法? (Simpler than working with groups after an re.match.) (比重新匹配后使用组更简单。)

Here is the only way I have found to "compile" an re replace expression for sub. 这是我发现“编译” sub的替换表达式的唯一方法。 There are a few extra constraints: both spaces and newlines have to be written like spaces are written for the re match expression (in square brackets: [ ] and [\\n\\n\\n]) and the whole replace expression should have a verbose newline at the beginning. 还有一些额外的约束:空格和换行符都必须像为re match表达式写空格一样写在方括号([]和[\\ n \\ n \\ n]中),并且整个replace表达式应具有冗长的含义。换行符开头。

An example: this searches a string and detects a word repeated after /ins/ and /del/, then replaces those occurrences with a single occurrence of the word in front of . 例如:这将搜索一个字符串并检测在/ ins /和/ del /之后重复的单词,然后将这些出现替换为之前出现的单个单词。

Both the match and the replace expressions are complex, which is why I want a verbose version of the replace expression. 匹配和替换表达式都很复杂,这就是为什么我想要替换表达式的详细版本。

=========================== ===========================

import re

test = "<p>Le petit <ins>homme à</ins> <del>homme en</del> ressorts</p>"


find=r"""
    <ins>
    (?P<front>[^<]+)          #there is something added that matches 
    (?P<delim1>[ .!,;:]+)     #get delimiter
    (?P<back1>[^<]*?)
    </ins>
    [ ]
    <del>
    (?P=front)
    (?P<delim2>[ .!,;:]+)
    (?P<back2>[^<]*?)
    </del>
"""
replace = r"""
    <<<<<\g<front>>>>>         #Pop out in front matching thing
    <ins>
    \g<delim1>
    \g<back1>
    </ins>
    [ ]     
    <del>    
    \g<delim2>             #put delimiters and backend back
    \g<back2>
    </del>
"""

flatReplace = r"""<<<<<\g<front>>>>><ins>\g<delim1>\g<back1></ins> <del>\g<delim2>\g<back2></del>"""


def compileRepl(inString):

    outString=inString
    #get space at front of line
    outString=re.sub(r"\n\s+","\n",outString)
    #get space at end of line
    outString=re.sub(r"\s+\n","",outString) 
    #get rid of comments
    outString=re.sub(r"\s*#[^\n]*\n","\n",outString)
    #preserve space in brackets, and eliminate brackets
    outString=re.sub(r"(?<!\[)\[(\s+)\](?!\[)",r"\1",outString)
    # get rid of newlines not in brackets
    outString=re.sub(r"(?<!\[)(\n)+(?!\])","",outString)
    #get rid of brackets around newlines
    outString=re.sub(r"\[((\\n)+)\]",r"\1",outString)
    #trim brackets    
    outString=re.sub(r"\[\[(.*?)\]\]","[\\1]",outString)
    return outString


assert(flatReplace == compileRepl(replace))


print test
print compileRepl(replace)
print re.sub(find,compileRepl(replace),test, flags=re.VERBOSE)

#<p>Le petit <ins>homme à</ins> <del>homme en</del> ressorts</p>
#<<<<<\g<front>>>>><ins>\g<delim1>\g<back1></ins> <del>\g<delim2>\g<back2></del>
#<p>Le petit <<<<<homme>>>><ins> à</ins> <del> en</del> ressorts</p>

You can first use re.compile to compile regular expressions. 您可以首先使用re.compile编译正则表达式。 Here, you can make use of re.VERBOSE flag. 在这里,您可以使用re.VERBOSE标志。 Later, you can pass these compiled expressions as argument to re.sub() 以后,您可以将这些编译后的表达式作为参数传递给re.sub()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM