简体   繁体   English

尝试了解和修改时间为[[hh:] mm:] ss格式的正则表达式规则,从右到左匹配

[英]trying to understand and modify regexp rule for time in format [[hh:]mm:]ss, matching from right to left

I'm trying to understand the following regexp rule: 我试图了解以下正则表达式规则:

import re

time_format = r"(?:(?P<weeks>\d+)\W*(?:weeks?|w),?)?\W*(?:(?P<days>\d+)\W*(?:days?|d),?)?\W*(?:(?P<hours>\d+):(?P<minutes>\d+)(?::(?P<seconds>\d+)(?:\.(?P<microseconds>\d+))?)?)?"
time_matcher = re.compile(time_format)
time_matches = time_matcher.match(td_str)

With this rule, if I set td_str = '0:10' I get the following result: 使用此规则,如果我将td_str = '0:10'设置td_str = '0:10' ,则会得到以下结果:

{'days': None,
 'hours': '0',
 'microseconds': None,
 'minutes': '01',
 'seconds': None,
 'weeks': None}

If I set td_str = '0:0:10' I get the following result: 如果我将td_str = '0:0:10'设置td_str = '0:0:10'得到以下结果:

{'days': None,
 'hours': '0',
 'microseconds': None,
 'minutes': '0',
 'seconds': '01',
 'weeks': None}

How do I have to change the regexp rule, so that 0:10 will be interpreted as 0 minutes + 10 seconds? 我如何更改正则表达式规则,以便将0:10解释为0分钟+ 10秒? Additionally, '1:20:1' should be interpreted as 1 hour + 20 minutes + 1 second. 此外,“ 1:20:1”应解释为1小时+ 20分钟+ 1秒。

So the regexp rule that I want to create (as far as I understand regexps) is: [H:[M:]]S 因此,据我了解,我想创建的正则表达式规则是: [H:[M:]]S

EDIT1 : I believe I've constructed a correct rule for [M:]S : EDIT1 :我相信我为[M:]S建立了正确的规则:

time_format = r"((?P<minutes>\d+)?:?)(?P<seconds>\d+)"

Can anybody confirm that this is the correct way of doing it? 有人可以确认这是正确的做法吗?

EDIT2 : expanding on the rule shown in Edit1, the following does work (sometimes): EDIT2 :在Edit1中显示的规则上扩展,以下方法确实有效(有时):

time_format = r"((((?P<hours>\d+)?:?)(?P<minutes>\d+))?:?)(?P<seconds>\d+)"

However, if I say time='1:10' , then this get's translated incorrectly to 1 hour, 1 minute and 0 seconds, instead of 1 minute and 10 seconds. 但是,如果我说time='1:10' ,则此错误地转换为1小时1分0秒,而不是1分10秒。

EDIT3 : this is how I've solved the problem for now, not using regexps. EDIT3 :这是我现在解决问题的方式,不使用regexps。 I would still love to know how to accomplish the same using regexps. 我仍然很想知道如何使用正则表达式来完成相同的工作。

# defaults
days = 0
hours = 0
minutes = 0
seconds = 0
microseconds = 0

split_fields = time_string.split(':')
nbr_fields = len(split_fields)

if nbr_fields == 0: # should never happen
    pass
if nbr_fields == 1:
    seconds = int(split_fields[0])
elif nbr_fields == 2:
    minutes = int(split_fields[0])
    seconds = int(split_fields[1])
elif nbr_fields == 3:
    hours = int(split_fields[0])
    minutes = int(split_fields[1])
    seconds = int(split_fields[2])
else: # in case there's more than 3 fields ...
    hours = int(split_fields[-3])
    minutes = int(split_fields[-2])
    seconds = int(split_fields[-1])

The part of the regex matching seconds is optional, so it can be matched as you have specified with a following ? 的正则表达式匹配秒部分是可选的,因此它可以为你用下面指定的相匹配? character. 字符。 The same applies to the microseconds field. 微秒字段也是如此。

Taking just the H:M[:S.[USEC]] part of the regex would yield something like this: 仅使用正则表达式的H:M[:S.[USEC]]部分将产生以下内容:

(?P<hours>\d+):(?P<minutes>\d+)(?::(?P<seconds>\d+)(\.(?P<microseconds>\d+))?)?

It's not always necessary to use regex's to do this kind of matching. 不一定总是需要使用正则表达式来进行这种匹配。 Sometimes its easier to write your own parser which splits the elements, eg using string.split(':') . 有时,编写自己的解析器来拆分元素会更容易,例如使用string.split(':') It may be more understandable when you come back to read the code later. 当您稍后再阅读代码时,这可能更容易理解。

(I just noticed you have a colon between seconds and microseconds. The regex listed above would have to change to account for that. The regex listed will match 01:02:03.456. (我刚刚注意到您在秒和微秒之间有一个冒号。上面列出的正则表达式必须更改以解决该问题。列出的正则表达式将匹配01:02:03.456。

edit: 编辑:

It's possible to structure your regex like (S)|(M:S)|(H:M:S) , however this will not work with named groups since the group name cannot appear more than once. 可以像(S)|(M:S)|(H:M:S)那样构造您的正则表达式,但是这不适用于命名组,因为组名不能出现多次。 The problem is that you want the engine to look ahead and match the rightmost token first before matching those to the left. 问题是您希望引擎先向前匹配最右边的标记,然后再匹配左边的标记。 The string will be scanned left-to-right for matches and as a result there is no way to describe the fields in an unambiguous manner, at least not when using named groups. 将从左到右扫描字符串以查找匹配项,因此无法以明确的方式描述字段,至少在使用命名组时不会如此。

Another solution not involving named groups is to use a more general expression such as (\\d+)(:\\d+)?(:\\d+)? 不涉及命名组的另一种解决方案是使用更通用的表达式,例如(\\d+)(:\\d+)?(:\\d+)? and then look at the returned groups that are not None to determine their meaning. 然后查看返回的不是None的组以确定它们的含义。 If there's 1 group, only S present, if 2, M:S etc. 如果有1个群组,则仅存在S,如果是2,则M:S等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM