简体   繁体   English

正则表达式以任何顺序捕获命名组

[英]Regex to capture named groups in any order

I have a scenario in which I need to use a single call to Python's re.sub() to find and replace items in a string. 我有一种情况,我需要使用对Python的re.sub()的单个调用来查找和替换字符串中的项目。 If that constraint sounds contrived, just consider this a mental exercise, but know that it is a real-life constraint I have to work with. 如果听起来像是人为限制,那就考虑一下这是一项脑力锻炼,但要知道这是我必须处理的现实生活中的限制。

I want to match and replace a line like either of these: 我要匹配并替换以下任一行:

foo -some-arg -o %output %input
foo %input -other-random-arg=baz -o %output

with this: 有了这个:

bar %output %input.out

The file names %input and %output can be anything matching [a-zA-Z0-9._-]+ but are always preceded by % 文件名%input和%output可以是与[a-zA-Z0-9._-]+匹配的任何名称,但始终以% [a-zA-Z0-9._-]+

I came up with this substitution which does not quite work. 我想出了这种替代方法,但效果不佳。

    r'''(?x)                     # Begin verbose regex
        foo[ ]                   # foo and a space
        (?=.*?-o[ ]                  # Lookahead for the first occurrence of -o
            (?P<a>%\S+\b)                # Output filename -> Group 'a'
        )
        (?=.*?                       # Lookahead from the same place as the first lookahead
                                     # so the two filenames can match in any order.
            (?!-o[ ]%\S+\b)              # Do not match the output file
            (?P<b>%\S+\b)                # Any filename -> Group 'b'
        ).*                      # Match anything ''',
    r'bar \g<b> \g<a>.out'       # Replacement

I often end up with one of the two file names repeated twice like: 我经常以两个重复的文件名之一结束,例如:

bar %output %output.out

Is there a way to named-capture the two file names in whatever order they appear? 有没有办法以它们出现的顺序命名捕获两个文件名? It seems that if I could advance the regex engine's pointer upon matching the one of the lookaheads, I could make this work. 看来,如果我可以在匹配其中一个先行时提高正则表达式引擎的指针,就可以完成这项工作。

Since all arguments begin with a dash and since input and output are always present one time, you can use this kind of pattern that ignores the order: 由于所有参数均以破折号开头,并且输入和输出始终仅出现一次,因此您可以使用这种忽略顺序的模式:

foo(?: -o (?P<output>\S+)| -\S+| (?P<input>\S+))+

and the replacement 和替换

bar \1 \2.out

Note: if you want to deal with filenames that contain spaces (that are escaped in a command line), you need to change \\S+ to (?:[^\\s\\\\]+(?:\\\\.[^\\s\\\\]*)*|[^\\s\\\\]*(?:\\\\.[^\\s\\\\]*)+) (only for input and output) 注意:如果要处理包含空格(在命令行中转义的空格)的文件名,则需要将\\S+更改为(?:[^\\s\\\\]+(?:\\\\.[^\\s\\\\]*)*|[^\\s\\\\]*(?:\\\\.[^\\s\\\\]*)+) (仅适用于输入和输出)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM