简体   繁体   中英

Python 3 regular expression for $ but not $$ in a string

I need to match one of the following anywhere in a string:

${aa:bb[99]}
${aa:bb}
${aa}

but not:

$${aa:bb[99]}
$${aa:bb}
$${aa}

my python 3 regex is:

pattern = **r"[^\$|/^]**\$\{(?P<section>[a-zA-Z]+?\:)?(?P<key>[a-zA-Z]+?)(?P<value>\[[0-9]+\])?\}"

What I'm looking for, is the proper way to say not $ or beginning of a string. The block r"[^\\$|/^]" will properly detect all cases but will fail if my string starts at the first character.

I trie, without success:

r"[^\$|\b]... 
r"[^\$|\B]...
r"[^\$]...
r"[^\$|^] 

Any suggestion?

Use a negative lookbehind:

(?<!\$)

and then follow it by the thing you actually want to match. This will ensure that the thing you actually want to match is not preceded by a $ (ie not preceded by a match for \\$ ):

(?<!\$)\$\{(?P<section>[a-zA-Z]+?\:)?(?P<key>[a-zA-Z]+?)(?P<value>\[[0-9]+\])?\}
     ^  ^
     |  |
     |  +--- The dollar sign you actually want to match
     |
     +--- The possible second preceding dollar sign you want to exclude

(?<!...)

Matches if the current position in the string is not preceded by a match for ... . This is called a negative lookbehind assertion. Similar to positive lookbehind assertions, the contained pattern must only match strings of some fixed length and shouldn't contain group references. Patterns which start with negative lookbehind assertions may match at the beginning of the string being searched.

https://docs.python.org/3/library/re.html

You can use a negative lookbehind (?<!\\$) to say "not preceded by $ ":

(?<!\$)\${[^}]*}

I have simplified the part between the brackets a bit to focus on the "one and only one $ part".

Here is a regex101 link .

Thank you Amber for the ideas. I followed the same train of thought you suggest using negative look ahead. I tried them all with https://regex101.com/r/G2n0cO/1/ . The only one that succeed almost perfectly is:

(?:^|[^\$])\${(?:(?P<section>[a-zA-Z0-9\-_]+?)\:)??(?P<key>[a-zA-Z0-9\-_]+?)(?:\[(?P<index>[0-9]+?)\])??\}

I still had to add a check to remove the last non-dollar character. at the end of the sample below. For history I kept a few of the iterations I made since I posted this question:

    # keep tokens ${[section:][key][\[index\]]}and skip false ones 
    # pattern = r"\$\{((?P<section>.+?)\:)?(?P<key>.+?)(\[(?P<index>\d+?)\])+?\}" 
    # pattern = r'\$\{((?P<section>\S+?)\:)??(?P<key>\S+?)(\[(?P<index>\d+?)\])?\}'
    # pattern = r'\$\{((?P<section>[a-zA-Z0-9\-_]+?)\:)??(?P<key>[a-zA-Z0-9\-_]+?)(\[(?P<index>[0-9]+?)\])??\}'
    pattern = r'(?:^|[^\$])\${(?:(?P<section>[a-zA-Z0-9\-_]+?)\:)??(?P<key>[a-zA-Z0-9\-_]+?)(?:\[(?P<index>[0-9]+?)\])??\}'

    analyser = re.compile(pattern)
    mo = analyser.search(value, 0)
    log.debug(f'got match object: {mo}')
    while not mo is None:
        log.debug(f'in while loop, level={level}')

        if level > MAX_LEVEL:
            raise RecursionError(f"to many recursive call to _substiture_text() while processing '{value}'.")
        else:
            level +=1

        start = mo.start()
        end   = mo.end()
        # re also captured the first non $ sign symbol
        if value[start] != '$': 
            start += 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM