简体   繁体   English

Python正则表达式匹配组具有超过预期的对象

[英]Python regex match groups have more than expected object

I am using re.search to parse a line for 3 separate pieces of data. 我正在使用re.search来解析3行独立数据的行。 (date temperature and pressure) The line looks like this. (日期温度和压力)线条看起来像这样。

line= "2015-10-08-22-50   27.3   1015.03"

I want to use pattern matching so that I can be very robust against malformed lines. 我想使用模式匹配,以便我可以非常强大地防止格式错误的行。 Using split has failed for that reason. 由于这个原因,使用split失败了。

I built following re. 我建立了以下重新。

m= re.search("^(2\d{3}-\d{2}-\d{2}-\d{2}-\d{2})\s+(\d+.\d+)\s+(\d+.\d+)$", line)

The parsing is fine however the match groups surprised me. 解析很好但匹配组让我感到惊讶。

>>> m.groups(1)
('2015-10-08-23-00', '27.3', '1014.99')
>>> m.groups(2)
('2015-10-08-23-00', '27.3', '1014.99')
>>> m.groups(3)
('2015-10-08-23-00', '27.3', '1014.99')

I (naively) had expected. 我(天真地)曾预料到。

>>> m.groups(1)
('2015-10-08-23-00')
>>> m.groups(2)
('27.3')
>>> m.groups(3)
('1014.99')

For now I work around this by using indices. 现在我通过使用索引来解决这个问题。

dt= m.groups(1)[0]
t = m.groups(2)[1]
p = m.groups(3)[2]

I conclude that the re that I believed was OK must be flawed or not as clean as possible. 我得出的结论是,我认为可以确定的那些必须是有缺陷的或尽可能不干净。

What's missing? 少了什么东西?

Thanks, Gert 谢谢,格特

Instead of: 代替:

m.groups(1)

I think you want: 我想你想要:

m.groups()[0]

The parameter to groups() is a default value, not a position in the tuple it returns. groups()的参数是默认值,而不是它返回的元组中的位置。 So you don't need to pass it anything. 所以你不需要传递任何东西。 You do need to index the tuple it returns. 你需要索引它返回的元组。

help(m.groups)
Help on built-in function groups:

groups(...)
    groups([default=None]) -> tuple.

    Return a tuple containing all the subgroups of the match, from 1.
    The default argument is used for groups
    that did not participate in the match
to capture parenthesized subgroup
use group, not groups

print(m.group(1))

2015-10-08-22-50

print(m.group(2))
27.3

print(m.group(3))
1015.03

print(m.group(1,3))

('2015-10-08-22-50', '1015.03')

The argument to m.groups() is not which capture group to return. m.groups()的参数不是返回哪个捕获组。 It's an optional default value to use for any capture groups that didn't match anything. 它是一个可选的默认值,可用于任何与任何内容都不匹配的捕获组。 Either way, the function returns a list containing all the capture groups, and you have to index it to get a particular one. 无论哪种方式,该函数都会返回一个包含所有捕获组的列表,您必须将其编入索引以获取特定的捕获组。

There is nothing wrong with your pattern. 你的模式没有任何问题。

You could use named groups to clarify: 您可以使用命名组来澄清:

>>> pat=re.compile(r"""^(?P<date>2\d{3}-\d{2}-\d{2}-\d{2}-\d{2})\s+
...                     (?P<temp>\d+.\d+)\s+
...                     (?P<pres>\d+.\d+)$""", re.X)
>>> line= "2015-10-08-22-50   27.3   1015.03"
>>> m=pat.search(line)

Which then produces a dictionary: 然后生成一个字典:

>>> m.groupdict()
{'date': '2015-10-08-22-50', 'temp': '27.3', 'pres': '1015.03'}
>>> m.group('date')
'2015-10-08-22-50'
>>> m.group('temp')
'27.3'

But can also be accessed as usual: 但也可以像往常一样访问:

>>> m.group(1)
'2015-10-08-22-50'
>>> m.group(2)
'27.3'

Or index the tuple returned by groups() but I don't think that is very clear (as you seem to have discovered...) 或者索引由groups()返回的元组,但我认为不是很清楚(因为你似乎已经发现了......)

>>> m.groups()[2]
'1015.03'

If you use named groups, you get the best of all worlds. 如果您使用命名组,您将获得最好的世界。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM