I have a scenario in which I need to use a single call to Python's re.sub() to find and replace items in a string. If that constraint sounds contrived, just consider this a mental exercise, but know that it is a real-life constraint I have to work with.
I want to match and replace a line like either of these:
foo -some-arg -o %output %input
foo %input -other-random-arg=baz -o %output
with this:
bar %output %input.out
The file names %input and %output can be anything matching [a-zA-Z0-9._-]+
but are always preceded by %
I came up with this substitution which does not quite work.
r'''(?x) # Begin verbose regex
foo[ ] # foo and a space
(?=.*?-o[ ] # Lookahead for the first occurrence of -o
(?P<a>%\S+\b) # Output filename -> Group 'a'
)
(?=.*? # Lookahead from the same place as the first lookahead
# so the two filenames can match in any order.
(?!-o[ ]%\S+\b) # Do not match the output file
(?P<b>%\S+\b) # Any filename -> Group 'b'
).* # Match anything ''',
r'bar \g<b> \g<a>.out' # Replacement
I often end up with one of the two file names repeated twice like:
bar %output %output.out
Is there a way to named-capture the two file names in whatever order they appear? It seems that if I could advance the regex engine's pointer upon matching the one of the lookaheads, I could make this work.
Since all arguments begin with a dash and since input and output are always present one time, you can use this kind of pattern that ignores the order:
foo(?: -o (?P<output>\S+)| -\S+| (?P<input>\S+))+
and the replacement
bar \1 \2.out
Note: if you want to deal with filenames that contain spaces (that are escaped in a command line), you need to change \\S+
to (?:[^\\s\\\\]+(?:\\\\.[^\\s\\\\]*)*|[^\\s\\\\]*(?:\\\\.[^\\s\\\\]*)+)
(only for input and output)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.