简体   繁体   English

如何在 Python 中获取匹配正则表达式的组名?

[英]How to get group name of match regular expression in Python?

Question is very basic whatever I do not know how to figure out group name from match.问题是非常基本的,我不知道如何从匹配中找出组名。 Let me explain in code:让我用代码解释一下:

import re    
a = list(re.finditer('(?P<name>[^\W\d_]+)|(?P<number>\d+)', 'Ala ma kota'))

How to get group name of a[0].group(0) match - assume that number of named patterns can be larger?如何获取a[0].group(0)匹配的组名 - 假设命名模式的数量可以更大?

Example is simplified to learn basics.示例被简化以学习基础知识。

I can invert match a[0].groupdict() but it will be slow.我可以反转匹配a[0].groupdict()但它会很慢。

You can get this information from the compiled expression :您可以从编译后的表达式中获取此信息:

>>> pattern = re.compile(r'(?P<name>\w+)|(?P<number>\d+)')
>>> pattern.groupindex
{'name': 1, 'number': 2}

This uses the RegexObject.groupindex attribute :这使用RegexObject.groupindex属性

A dictionary mapping any symbolic group names defined by (?P<id>) to group numbers.(?P<id>)定义的任何符号组名称映射到组编号的字典。 The dictionary is empty if no symbolic groups were used in the pattern.如果模式中没有使用符号组,则字典为空。

If you only have access to the match object, you can get to the pattern with the MatchObject.re attribute :如果您只能访问匹配对象,则可以使用MatchObject.re属性访问模式:

>>> a = list(re.finditer(r'(?P<name>\w+)|(?P<number>\d+)', 'Ala ma kota'))
>>> a[0]
<_sre.SRE_Match object at 0x100264ad0>
>>> a[0].re.groupindex
{'name': 1, 'number': 2}

If all you wanted to know what group matched look at the value;如果您想知道匹配哪个组,请查看值; None means a group never was used in a match: None表示从未在比赛中使用过组:

>>> a[0].groupdict()
{'name': 'Ala', 'number': None}

The number group never used to match anything because its value is None . number组从未用于匹配任何内容,因为它的值为None

You can then find the names used in the regular expression with:然后,您可以使用以下命令查找正则表达式中使用的名称:

names_used = [name for name, value in matchobj.groupdict().iteritems() if value is not None]

or if there is only ever one group that can match, you can use MatchObject.lastgroup :或者如果只有一个组可以匹配,您可以使用MatchObject.lastgroup

name_used = matchobj.lastgroup

As a side note, your regular expression has a fatal flaw;作为旁注,您的正则表达式有一个致命的缺陷; everything that \\d matches, is also matched by \\w . \\d匹配的所有内容也与\\w匹配。 You'll never see number used where name can match first.您永远不会看到在name可以首先匹配的地方使用的number Reverse the pattern to avoid this:反转模式以避免这种情况:

>>> for match in re.finditer(r'(?P<name>\w+)|(?P<number>\d+)', 'word 42'):
...     print match.lastgroup
... 
name
name
>>> for match in re.finditer(r'(?P<number>\d+)|(?P<name>\w+)', 'word 42'):
...     print match.lastgroup
... 
name
number

but take into account that words starting with digits will still confuse things for your simple case:但考虑到以数字开头的单词仍然会混淆你的简单情况:

>>> for match in re.finditer(r'(?P<number>\d+)|(?P<name>\w+)', 'word42 42word'):
...     print match.lastgroup, repr(match.group(0))
... 
name 'word42'
number '42'
name 'word'

First of all your regular expression is syntactically wrong: you should write it as r'(?P<name>\\w+)|(?P<number>\\d+)' .首先,您的正则表达式在语法上是错误的:您应该将其写为r'(?P<name>\\w+)|(?P<number>\\d+)' Moreover even this reg expr does not work, since the special sequence \\w matches all alphanumeric characters and hence also all characters matched by \\d .此外,即使这个 reg expr 也不起作用,因为特殊序列\\w匹配所有字母数字字符,因此也匹配所有与\\d匹配的字符。 You should change it to r'(?P<number>\\d+)|(?P<name>\\w+)' to give \\d precedence over \\w .您应该将其更改为r'(?P<number>\\d+)|(?P<name>\\w+)'以使\\d优先于\\w However you can get the name of the matching group by using the attribute lastgroup of the matched objects, ie: [m.lastgroup for m in re.finditer(r'(?P<number>\\d+)|(?P<name>\\w+)', 'Ala ma 123 kota')] producing: ['name', 'name', 'number', 'name']但是,您可以通过使用匹配对象的属性lastgroup来获取匹配组的名称,即: [m.lastgroup for m in re.finditer(r'(?P<number>\\d+)|(?P<name>\\w+)', 'Ala ma 123 kota')]产生: ['name', 'name', 'number', 'name']

name_pattern = "(((\s+)?)((?P<HeadCount>[0-9]{1,2})(?P<LastName>[A-Z]{1,})((([\/]{1,})?)((?P<FirstName>[A-Z]{1,})?)){0,}){1,})"

name_text = "1GILL/HAROONCONSTANTSHER 1HAROON/ANILAMS"
for match in re.finditer(name_pattern,name_text):
    print(match["LastName"])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM