[英]How to get group name of match regular expression in Python?
Question is very basic whatever I do not know how to figure out group name from match.问题是非常基本的,我不知道如何从匹配中找出组名。 Let me explain in code:
让我用代码解释一下:
import re
a = list(re.finditer('(?P<name>[^\W\d_]+)|(?P<number>\d+)', 'Ala ma kota'))
How to get group name of a[0].group(0)
match - assume that number of named patterns can be larger?如何获取
a[0].group(0)
匹配的组名 - 假设命名模式的数量可以更大?
Example is simplified to learn basics.示例被简化以学习基础知识。
I can invert match a[0].groupdict()
but it will be slow.我可以反转匹配
a[0].groupdict()
但它会很慢。
You can get this information from the compiled expression :您可以从编译后的表达式中获取此信息:
>>> pattern = re.compile(r'(?P<name>\w+)|(?P<number>\d+)')
>>> pattern.groupindex
{'name': 1, 'number': 2}
This uses the RegexObject.groupindex
attribute :这使用
RegexObject.groupindex
属性:
A dictionary mapping any symbolic group names defined by
(?P<id>)
to group numbers.将
(?P<id>)
定义的任何符号组名称映射到组编号的字典。 The dictionary is empty if no symbolic groups were used in the pattern.如果模式中没有使用符号组,则字典为空。
If you only have access to the match object, you can get to the pattern with the MatchObject.re
attribute :如果您只能访问匹配对象,则可以使用
MatchObject.re
属性访问模式:
>>> a = list(re.finditer(r'(?P<name>\w+)|(?P<number>\d+)', 'Ala ma kota'))
>>> a[0]
<_sre.SRE_Match object at 0x100264ad0>
>>> a[0].re.groupindex
{'name': 1, 'number': 2}
If all you wanted to know what group matched look at the value;如果您想知道匹配哪个组,请查看值;
None
means a group never was used in a match: None
表示从未在比赛中使用过组:
>>> a[0].groupdict()
{'name': 'Ala', 'number': None}
The number
group never used to match anything because its value is None
. number
组从未用于匹配任何内容,因为它的值为None
。
You can then find the names used in the regular expression with:然后,您可以使用以下命令查找正则表达式中使用的名称:
names_used = [name for name, value in matchobj.groupdict().iteritems() if value is not None]
or if there is only ever one group that can match, you can use MatchObject.lastgroup
:或者如果只有一个组可以匹配,您可以使用
MatchObject.lastgroup
:
name_used = matchobj.lastgroup
As a side note, your regular expression has a fatal flaw;作为旁注,您的正则表达式有一个致命的缺陷; everything that
\\d
matches, is also matched by \\w
. \\d
匹配的所有内容也与\\w
匹配。 You'll never see number
used where name
can match first.您永远不会看到在
name
可以首先匹配的地方使用的number
。 Reverse the pattern to avoid this:反转模式以避免这种情况:
>>> for match in re.finditer(r'(?P<name>\w+)|(?P<number>\d+)', 'word 42'):
... print match.lastgroup
...
name
name
>>> for match in re.finditer(r'(?P<number>\d+)|(?P<name>\w+)', 'word 42'):
... print match.lastgroup
...
name
number
but take into account that words starting with digits will still confuse things for your simple case:但考虑到以数字开头的单词仍然会混淆你的简单情况:
>>> for match in re.finditer(r'(?P<number>\d+)|(?P<name>\w+)', 'word42 42word'):
... print match.lastgroup, repr(match.group(0))
...
name 'word42'
number '42'
name 'word'
First of all your regular expression is syntactically wrong: you should write it as r'(?P<name>\\w+)|(?P<number>\\d+)'
.首先,您的正则表达式在语法上是错误的:您应该将其写为
r'(?P<name>\\w+)|(?P<number>\\d+)'
。 Moreover even this reg expr does not work, since the special sequence \\w
matches all alphanumeric characters and hence also all characters matched by \\d
.此外,即使这个 reg expr 也不起作用,因为特殊序列
\\w
匹配所有字母数字字符,因此也匹配所有与\\d
匹配的字符。 You should change it to r'(?P<number>\\d+)|(?P<name>\\w+)'
to give \\d
precedence over \\w
.您应该将其更改为
r'(?P<number>\\d+)|(?P<name>\\w+)'
以使\\d
优先于\\w
。 However you can get the name of the matching group by using the attribute lastgroup
of the matched objects, ie: [m.lastgroup for m in re.finditer(r'(?P<number>\\d+)|(?P<name>\\w+)', 'Ala ma 123 kota')]
producing: ['name', 'name', 'number', 'name']
但是,您可以通过使用匹配对象的属性
lastgroup
来获取匹配组的名称,即: [m.lastgroup for m in re.finditer(r'(?P<number>\\d+)|(?P<name>\\w+)', 'Ala ma 123 kota')]
产生: ['name', 'name', 'number', 'name']
name_pattern = "(((\s+)?)((?P<HeadCount>[0-9]{1,2})(?P<LastName>[A-Z]{1,})((([\/]{1,})?)((?P<FirstName>[A-Z]{1,})?)){0,}){1,})"
name_text = "1GILL/HAROONCONSTANTSHER 1HAROON/ANILAMS"
for match in re.finditer(name_pattern,name_text):
print(match["LastName"])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.