简体   繁体   English

使用re.findall捕获正则表达式中的命名组

[英]Capturing named groups in regex with re.findall

When I was trying to answer this question: regex to split %ages and values in python I noticed that I had to re-order the groups from the result of findall. 当我试图回答这个问题时: 正则表达式在python中分割%年龄和值我注意到我必须从findall的结果重新排序组。 For example: 例如:

data = """34% passed 23% failed 46% deferred"""
result = {key:value for value, key in re.findall('(\w+)%\s(\w+)', data)}
print(result)
>>> {'failed': '23', 'passed': '34', 'deferred': '46'}

Here the result of the findall is: 这里findall的结果是:

>>> re.findall('(\w+)%\s(\w+)', data)
>>> [('34', 'passed'), ('23', 'failed'), ('46', 'deferred')]

Is there a way to change/specify the order of the groups that makes re.findall return : 有没有办法更改/指定使re.findall返回的组的顺序:

[('passed', '34'), ('failed', '23'), ('deferred', '46')]

Just to clarify, the question is: 只是为了澄清,问题是:

Is it possible to specfic the order or re-order the groups for the return of the re.findall function? 是否可以指定顺序或重新排序组以返回re.findall函数?

I used the example above to create a dictionary to provide a reason/use case for when you would want to change the order (making key as value and value as key) 我使用上面的示例创建了一个字典,以便在您想要更改顺序时提供原因/用例(将键作为值和值作为键)

Further clarification: 进一步澄清:

In order to handle groups in larger more complicated regexes, you can name groups, but those names are only accessible when you do a re.search pr re.match. 为了处理更大更复杂的正则表达式中的组,您可以命名组,但只有在执行re.search pr re.match时才能访问这些名称。 From what I have read, findall has a fixed indices for groups returned in the tuple, The question is anyone know how those indices could be modified. 根据我的阅读,findall对元组中返回的组有一个固定的索引,问题是任何人都知道如何修改这些索引。 This would help make handling of groups easier and intuitive. 这将有助于使组的处理更容易和直观。

Take 3, based on a further clarification of the OP's intent in this comment . 基于对本评论中 OP意图的进一步澄清,取3。

Ashwin is correct that findall does not preserve named capture groups (eg (?P<name>regex) ). Ashwin是正确的, findall不保留命名的捕获组(例如(?P<name>regex) )。 finditer to the rescue! finditer救援! It returns the individual match objects one-by-one. 它逐个返回各个匹配对象。 Simple example: 简单的例子:

data = """34% passed 23% failed 46% deferred"""
for m in re.finditer('(?P<percentage>\w+)%\s(?P<word>\w+)', data):
    print( m.group('percentage'), m.group('word') )

As you've identified in your second example, re.findall returns the groups in the original order. 正如您在第二个示例中所确定的那样, re.findall以原始顺序返回组。

The problem is that the standard Python dict type does not preserve the order of keys in any way. 问题是标准的Python dict类型不以任何方式保留键的顺序 Here's the manual for Python 2.x, which makes it explicit, but it's still true in Python 3.x: https://docs.python.org/2/library/stdtypes.html#dict.items 这是Python 2.x的手册,它使其显式化,但在Python 3.x中仍然如此: https//docs.python.org/2/library/stdtypes.html#dict.items

What you should use instead is collections.OrderedDict : 您应该使用的是collections.OrderedDict

from collections import OrderedDict as odict

data = """34% passed 23% failed 46% deferred"""
result = odict((key,value) for value, key in re.findall('(\w+)%\s(\w+)', data))
print(result)
>>> OrderedDict([('passed', '34'), ('failed', '23'), ('deferred', '46')])

Notice that you must use the pairwise constructor form ( dict((k,v) for k,v in ... ) rather than the dict comprehension constructor ( {k:v for k,v in ...} ). That's because the latter constructs instances of dict type, which cannot be converted to OrderedDict without losing the order of the keys... which is of course what you are trying to preserve in the first place. 请注意,您必须使用成对构造函数形式( dict((k,v) for k,v in ... )而不是dict理解构造函数( {k:v for k,v in ...} )。这是因为后者构造了dict类型的实例,它不能在不丢失键的顺序的情况下转换为OrderedDict ...这当然是你想要保留的第一个地方。

Per the OP's comment on my first answer : If you are simply trying to reorder a list of 2-tuples like this: 根据OP对我的第一个答案的评论 :如果你只是试图重新排序这样的2元组列表:

[('34', 'passed'), ('23', 'failed'), ('46', 'deferred')]

... to look like this, with individual elements reversed: ......看起来像这样,个别元素反转:

[('passed', '34'), ('failed', '23'), ('deferred', '46')]

There's an easy solution: use a list comprehension with the slicing syntax sequence[::-1] to reverse the order of the elements of the individual tuples: 有一个简单的解决方案:使用切片语法sequence[::-1]的列表理解来反转单个元组元素的顺序:

a = [('34', 'passed'), ('23', 'failed'), ('46', 'deferred')]
b = [x[::-1] for x in a]
print b

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM