[英]Regex to capture named groups in any order
I have a scenario in which I need to use a single call to Python's re.sub() to find and replace items in a string. 我有一种情况,我需要使用对Python的re.sub()的单个调用来查找和替换字符串中的项目。 If that constraint sounds contrived, just consider this a mental exercise, but know that it is a real-life constraint I have to work with.
如果听起来像是人为限制,那就考虑一下这是一项脑力锻炼,但要知道这是我必须处理的现实生活中的限制。
I want to match and replace a line like either of these: 我要匹配并替换以下任一行:
foo -some-arg -o %output %input
foo %input -other-random-arg=baz -o %output
with this: 有了这个:
bar %output %input.out
The file names %input and %output can be anything matching [a-zA-Z0-9._-]+
but are always preceded by %
文件名%input和%output可以是与
[a-zA-Z0-9._-]+
匹配的任何名称,但始终以%
[a-zA-Z0-9._-]+
I came up with this substitution which does not quite work. 我想出了这种替代方法,但效果不佳。
r'''(?x) # Begin verbose regex
foo[ ] # foo and a space
(?=.*?-o[ ] # Lookahead for the first occurrence of -o
(?P<a>%\S+\b) # Output filename -> Group 'a'
)
(?=.*? # Lookahead from the same place as the first lookahead
# so the two filenames can match in any order.
(?!-o[ ]%\S+\b) # Do not match the output file
(?P<b>%\S+\b) # Any filename -> Group 'b'
).* # Match anything ''',
r'bar \g<b> \g<a>.out' # Replacement
I often end up with one of the two file names repeated twice like: 我经常以两个重复的文件名之一结束,例如:
bar %output %output.out
Is there a way to named-capture the two file names in whatever order they appear? 有没有办法以它们出现的顺序命名捕获两个文件名? It seems that if I could advance the regex engine's pointer upon matching the one of the lookaheads, I could make this work.
看来,如果我可以在匹配其中一个先行时提高正则表达式引擎的指针,就可以完成这项工作。
Since all arguments begin with a dash and since input and output are always present one time, you can use this kind of pattern that ignores the order: 由于所有参数均以破折号开头,并且输入和输出始终仅出现一次,因此您可以使用这种忽略顺序的模式:
foo(?: -o (?P<output>\S+)| -\S+| (?P<input>\S+))+
and the replacement 和替换
bar \1 \2.out
Note: if you want to deal with filenames that contain spaces (that are escaped in a command line), you need to change \\S+
to (?:[^\\s\\\\]+(?:\\\\.[^\\s\\\\]*)*|[^\\s\\\\]*(?:\\\\.[^\\s\\\\]*)+)
(only for input and output) 注意:如果要处理包含空格(在命令行中转义的空格)的文件名,则需要将
\\S+
更改为(?:[^\\s\\\\]+(?:\\\\.[^\\s\\\\]*)*|[^\\s\\\\]*(?:\\\\.[^\\s\\\\]*)+)
(仅适用于输入和输出)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.