简体   繁体   中英

re.sub in python : verbose mode does not work with replacement pattern?

Is there any way to get around this limitation of re.sub? It is not fully functional for verbose mode (with back reference here) in the replace pattern; it does not eliminate whitespace or comments (yet it does interpret backreferences properly).

import remport re

ft1=r"""(?P<test>[0-9]+)"""
ft2=r"""\g<test>and then: \g<test> #this remains"""

print re.sub(ft1,ft2,"front 1234 back",flags=re.VERBOSE) #Does not work 
#result: front 1234and then: 1234 #this remains back

re.VERBOSE does not apply to the replacement pattern... Is there a work-around? (Simpler than working with groups after an re.match.)

Here is the only way I have found to "compile" an re replace expression for sub. There are a few extra constraints: both spaces and newlines have to be written like spaces are written for the re match expression (in square brackets: [ ] and [\\n\\n\\n]) and the whole replace expression should have a verbose newline at the beginning.

An example: this searches a string and detects a word repeated after /ins/ and /del/, then replaces those occurrences with a single occurrence of the word in front of .

Both the match and the replace expressions are complex, which is why I want a verbose version of the replace expression.

===========================

import re

test = "<p>Le petit <ins>homme à</ins> <del>homme en</del> ressorts</p>"


find=r"""
    <ins>
    (?P<front>[^<]+)          #there is something added that matches 
    (?P<delim1>[ .!,;:]+)     #get delimiter
    (?P<back1>[^<]*?)
    </ins>
    [ ]
    <del>
    (?P=front)
    (?P<delim2>[ .!,;:]+)
    (?P<back2>[^<]*?)
    </del>
"""
replace = r"""
    <<<<<\g<front>>>>>         #Pop out in front matching thing
    <ins>
    \g<delim1>
    \g<back1>
    </ins>
    [ ]     
    <del>    
    \g<delim2>             #put delimiters and backend back
    \g<back2>
    </del>
"""

flatReplace = r"""<<<<<\g<front>>>>><ins>\g<delim1>\g<back1></ins> <del>\g<delim2>\g<back2></del>"""


def compileRepl(inString):

    outString=inString
    #get space at front of line
    outString=re.sub(r"\n\s+","\n",outString)
    #get space at end of line
    outString=re.sub(r"\s+\n","",outString) 
    #get rid of comments
    outString=re.sub(r"\s*#[^\n]*\n","\n",outString)
    #preserve space in brackets, and eliminate brackets
    outString=re.sub(r"(?<!\[)\[(\s+)\](?!\[)",r"\1",outString)
    # get rid of newlines not in brackets
    outString=re.sub(r"(?<!\[)(\n)+(?!\])","",outString)
    #get rid of brackets around newlines
    outString=re.sub(r"\[((\\n)+)\]",r"\1",outString)
    #trim brackets    
    outString=re.sub(r"\[\[(.*?)\]\]","[\\1]",outString)
    return outString


assert(flatReplace == compileRepl(replace))


print test
print compileRepl(replace)
print re.sub(find,compileRepl(replace),test, flags=re.VERBOSE)

#<p>Le petit <ins>homme à</ins> <del>homme en</del> ressorts</p>
#<<<<<\g<front>>>>><ins>\g<delim1>\g<back1></ins> <del>\g<delim2>\g<back2></del>
#<p>Le petit <<<<<homme>>>><ins> à</ins> <del> en</del> ressorts</p>

You can first use re.compile to compile regular expressions. Here, you can make use of re.VERBOSE flag. Later, you can pass these compiled expressions as argument to re.sub()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM