简体   繁体   中英

Regex to capture named groups in any order

I have a scenario in which I need to use a single call to Python's re.sub() to find and replace items in a string. If that constraint sounds contrived, just consider this a mental exercise, but know that it is a real-life constraint I have to work with.

I want to match and replace a line like either of these:

foo -some-arg -o %output %input
foo %input -other-random-arg=baz -o %output

with this:

bar %output %input.out

The file names %input and %output can be anything matching [a-zA-Z0-9._-]+ but are always preceded by %

I came up with this substitution which does not quite work.

    r'''(?x)                     # Begin verbose regex
        foo[ ]                   # foo and a space
        (?=.*?-o[ ]                  # Lookahead for the first occurrence of -o
            (?P<a>%\S+\b)                # Output filename -> Group 'a'
        )
        (?=.*?                       # Lookahead from the same place as the first lookahead
                                     # so the two filenames can match in any order.
            (?!-o[ ]%\S+\b)              # Do not match the output file
            (?P<b>%\S+\b)                # Any filename -> Group 'b'
        ).*                      # Match anything ''',
    r'bar \g<b> \g<a>.out'       # Replacement

I often end up with one of the two file names repeated twice like:

bar %output %output.out

Is there a way to named-capture the two file names in whatever order they appear? It seems that if I could advance the regex engine's pointer upon matching the one of the lookaheads, I could make this work.

Since all arguments begin with a dash and since input and output are always present one time, you can use this kind of pattern that ignores the order:

foo(?: -o (?P<output>\S+)| -\S+| (?P<input>\S+))+

and the replacement

bar \1 \2.out

Note: if you want to deal with filenames that contain spaces (that are escaped in a command line), you need to change \\S+ to (?:[^\\s\\\\]+(?:\\\\.[^\\s\\\\]*)*|[^\\s\\\\]*(?:\\\\.[^\\s\\\\]*)+) (only for input and output)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM