简体   繁体   English

重用正则表达式模式的一部分

[英]Reuse part of a Regex pattern

Consider this (very simplified) example string:考虑这个(非常简化的)示例字符串:

1aw2,5cx7

As you can see, it is two digit/letter/letter/digit values separated by a comma.如您所见,它是由逗号分隔的两位digit/letter/letter/digit值。

Now, I could match this with the following:现在,我可以将其与以下内容进行匹配:

>>> from re import match
>>> match("\d\w\w\d,\d\w\w\d", "1aw2,5cx7")
<_sre.SRE_Match object at 0x01749D40>
>>>

The problem is though, I have to write \\d\\w\\w\\d twice.问题是,我必须写\\d\\w\\w\\d两次。 With small patterns, this isn't so bad but, with more complex Regexes, writing the exact same thing twice makes the end pattern enormous and cumbersome to work with.对于小模式,这还不错,但是,对于更复杂的正则表达式,两次编写完全相同的内容会使最终模式变得庞大且难以处理。 It also seems redundant.这似乎也是多余的。

I tried using a named capture group:我尝试使用命名的捕获组:

>>> from re import match
>>> match("(?P<id>\d\w\w\d),(?P=id)", "1aw2,5cx7")
>>>

But it didn't work because it was looking for two occurrences of 1aw2 , not digit/letter/letter/digit .但它不起作用,因为它正在寻找1aw2两次出现,而不是digit/letter/letter/digit

Is there any way to save part of a pattern, such as \\d\\w\\w\\d , so it can be used latter on in the same pattern?有没有办法保存模式的一部分,例如\\d\\w\\w\\d ,以便以后可以在同一模式中使用它? In other words, can I reuse a sub-pattern in a pattern?换句话说,我可以在模式中重用子模式吗?

No, when using the standard library re module, regular expression patterns cannot be 'symbolized'.不,当使用标准库re模块时,正则表达式模式不能被“符号化”。

You can always do so by re-using Python variables, of course:你总是可以通过重用 Python 变量来做到这一点,当然:

digit_letter_letter_digit = r'\d\w\w\d'

then use string formatting to build the larger pattern:然后使用字符串格式来构建更大的模式:

match(r"{0},{0}".format(digit_letter_letter_digit), inputtext)

or, using Python 3.6+ f-strings:或者,使用 Python 3.6+ f 字符串:

dlld = r'\d\w\w\d'
match(fr"{dlld},{dlld}", inputtext)

I often do use this technique to compose larger, more complex patterns from re-usable sub-patterns.我经常使用这种技术从可重用的子模式中组合出更大、更复杂的模式。

If you are prepared to install an external library, then the regex project can solve this problem with a regex subroutine call .如果你准备安装外部库,那么regex项目可以通过一个regex子程序调用来解决这个问题。 The syntax (?<digit>) re-uses the pattern of an already used (implicitly numbered) capturing group:语法(?<digit>)重新使用已使用(隐式编号)捕获组的模式:

(\d\w\w\d),(?1)
^........^ ^..^
|           \
|             re-use pattern of capturing group 1  
\
  capturing group 1

You can do the same with named capturing groups, where (?<groupname>...) is the named group groupname , and (?&groupname) , (?P&groupname) or (?P>groupname) re-use the pattern matched by groupname (the latter two forms are alternatives for compatibility with other engines).您可以对命名捕获组执行相同操作,其中(?<groupname>...)是命名组groupname ,并且(?&groupname)(?P&groupname)(?P>groupname)重新使用匹配的模式groupname (后两种形式是与其他引擎兼容的替代形式)。

And finally, regex supports the (?(DEFINE)...) block to 'define' subroutine patterns without them actually matching anything at that stage.最后, regex支持(?(DEFINE)...)块来“定义”子例程模式,而无需它们在该阶段实际匹配任何内容。 You can put multiple (..) and (?<name>...) capturing groups in that construct to then later refer to them in the actual pattern:您可以在该构造中放置多个(..)(?<name>...)捕获组,以便稍后在实际模式中引用它们:

(?(DEFINE)(?<dlld>\d\w\w\d))(?&dlld),(?&dlld)
          ^...............^ ^......^ ^......^
          |                    \       /          
 creates 'dlld' pattern      uses 'dlld' pattern twice

Just to be explicit: the standard library re module does not support subroutine patterns.明确地说:标准库re模块不支持子程序模式。

Note: this will work with PyPi regex module , not with re module.注意:这将适用于PyPi regex module ,而不适用于re模块。

You could use the notation (?group-number) , in your case:在您的情况下,您可以使用符号(?group-number)

(\d\w\w\d),(?1)

it is equivalent to:它相当于:

(\d\w\w\d),(\d\w\w\d)

Be aware that \\w includes \\d .请注意, \\w包括\\d The regex will be:正则表达式将是:

(\d[a-zA-Z]{2}\d),(?1)

I was troubled with the same problem and wrote this snippet我被同样的问题困扰并写了这个片段

import nre
my_regex=nre.from_string('''
a=\d\w\w\d
b={{a}},{{a}}
c=?P<id>{{a}}),(?P=id)
''')
my_regex["b"].match("1aw2,5cx7")

For lack of a more descriptive name, I named the partial regexes as a , b and c .由于缺乏更具描述性的名称,我将部分正则表达式命名为abc

Accessing them is as easy as {{a}}访问它们就像{{a}}一样简单

import re
digit_letter_letter_digit = re.compile("\d\w\w\d") # we compile pattern so that we can reuse it later
all_finds = re.findall(digit_letter_letter_digit, "1aw2,5cx7") # finditer instead of findall
for value in all_finds:
    print(re.match(digit_letter_letter_digit, value))

Since you're already using re, why not use string processing to manage the pattern repetition as well:既然你已经在使用 re,为什么不使用字符串处理来管理模式重复:

pattern = "P,P".replace("P",r"\d\w\w\d")

re.match(pattern, "1aw2,5cx7")

OR或者

P = r"\d\w\w\d"

re.match(f"{P},{P}", "1aw2,5cx7")

Try using back referencing, i believe it works something like below to match尝试使用反向引用,我相信它可以像下面这样匹配

1aw2,5cx7

You could use你可以用

(\d\w\w\d),\1

See here for reference http://www.regular-expressions.info/backref.html请参阅此处以供参考http://www.regular-expressions.info/backref.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python RegEx模式似乎忽略了模式的一部分 - Python RegEx pattern seems to ignore part of the pattern 正则表达式:是否可以使用“|” 仅匹配模式的一部分? - Regex: Is it possible to use "|" for only part of match pattern? 如何在正则表达式python中搜索部分模式 - How to search part of pattern in regex python 检查字符串的一部分是否包含正则表达式 - Check if part of string contains regex pattern 如何通过正则表达式用保存部分模式替换字符串的一部分? - How to replace part of string via regex with saving part of pattern? 正则表达式:当字符串包含正则表达式模式的一部分时,匹配字符串的一部分 - regex: Matching parts of a string when the string contains part of a regex pattern 查找与正则表达式匹配的所有行并获取字符串的一部分 - Find all lines that match regex pattern and grab part of string 使用Python正则表达式替换字符串中所有出现的“模式” - Replace all occurrences of 'pattern' in part of a string using Python regex 如何只允许对模式的一部分进行模糊正则表达式匹配? - How can I allow a fuzzy regex match for only part of the pattern? 我可以编写一个匹配模式的正则表达式,并且该模式的一部分是反向匹配吗? - Can I write a RegEx which matches a pattern, and have part of that pattern be an inverse match?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM