简体   繁体   中英

Python regex - Replace all characters except those between braces

I'm A bit stuck with a regular expression. I have a string in the format

{% 'ello %} wor'ld {% te'st %}

and I want to escape only apostrophes that aren't between {% ... %} tags, so the expected output is

{% 'ello %} wor"ld {% te'st %}

I know I can replace all of them just using the string replace function, but I'm at a loss as to how to use regexs to just match those outside braces

This can probably be done with regex, but it would be a complicated one. It's easier to write and read if you just do it directly:

def escape(s):
    isIn = False
    ret = []
    for i in range(len(s)):
        if not isIn and s[i]=="'": ret += ["""]
        else: ret += s[i:i+1]

        if isIn and s[i:i+2]=="%}": isIn = False
        if not isIn and s[i:i+2]=="{%": isIn = True

    return "".join(ret)

Just for fun, this is the way to do it with regex:

>>> instr = "{% 'ello %} wor&quote;ld {% te'st %}"
>>> re.sub(r'\'(?=(.(?!%}))*({%|$))', r'&quote;', instr)
"{% 'ello %} wor&quote;ld {% te'st %}"

It uses a positive look ahead to find either {% or the end of the string, and a negative lookahead inside that positive lookahead to make sure it is not including any %} in the looking forward.

If you want to use regular expression, you could do it like this though:

>>> s = """'{% 'ello %} wor'ld {% te'st %}'"""
>>> segments = re.split( '(\{%.*?%\})', s )
>>> for i in range( 0, len( segments ), 2 ):
    segments[i] = segments[i].replace( '\'', '"' )

>>> ''.join( segments )
""{% 'ello %} wor"ld {% te'st %}""

Comparing with Ehsan's look-ahead solution, this has the benefit that you can run any kind of replacements or analysis on the segments without having to re-run another regular expression. So if you decide to replace another character, you can easily do that in the loop.

bcloughlan, resurrecting this question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a general question about how to exclude patterns in regex .)

Here's a simple regex:

{%.*?%}|(\')

The left side of the alternation matches complete {% ... %} tags. We will ignore these matches. The right side matches and captures apostrophes to Group 1, and we know they are the right apostrophes because they were not matched by the expression on the left.

This program shows how to use the regex (see the results in the online demo ):

import re
subject = "{% 'ello %} wor'ld {% te'st %}"
regex = re.compile(r'{%.*?%}|(\')')
def myreplacement(m):
    if m.group(1):
        return """
    else:
        return m.group(0)
replaced = regex.sub(myreplacement, subject)
print(replaced)

Reference

  1. How to match pattern except in situations s1, s2, s3
  2. How to match a pattern unless...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM