尝试了解和修改时间为[[hh：] mm：] ss格式的正则表达式规则，从右到左匹配

Question

I'm trying to understand the following regexp rule: 我试图了解以下正则表达式规则：

import re

time_format = r"(?:(?P<weeks>\d+)\W*(?:weeks?|w),?)?\W*(?:(?P<days>\d+)\W*(?:days?|d),?)?\W*(?:(?P<hours>\d+):(?P<minutes>\d+)(?::(?P<seconds>\d+)(?:\.(?P<microseconds>\d+))?)?)?"
time_matcher = re.compile(time_format)
time_matches = time_matcher.match(td_str)

With this rule, if I set td_str = '0:10' I get the following result: 使用此规则，如果我将td_str = '0:10'设置td_str = '0:10' ，则会得到以下结果：

{'days': None,
 'hours': '0',
 'microseconds': None,
 'minutes': '01',
 'seconds': None,
 'weeks': None}

If I set td_str = '0:0:10' I get the following result: 如果我将td_str = '0:0:10'设置td_str = '0:0:10'得到以下结果：

{'days': None,
 'hours': '0',
 'microseconds': None,
 'minutes': '0',
 'seconds': '01',
 'weeks': None}

How do I have to change the regexp rule, so that 0:10 will be interpreted as 0 minutes + 10 seconds? 我如何更改正则表达式规则，以便将0:10解释为0分钟+ 10秒？ Additionally, '1:20:1' should be interpreted as 1 hour + 20 minutes + 1 second. 此外，“ 1：20：1”应解释为1小时+ 20分钟+ 1秒。

So the regexp rule that I want to create (as far as I understand regexps) is: [H:[M:]]S 因此，据我了解，我想创建的正则表达式规则是： [H:[M:]]S

EDIT1 : I believe I've constructed a correct rule for [M:]S : EDIT1 ：我相信我为[M:]S建立了正确的规则：

time_format = r"((?P<minutes>\d+)?:?)(?P<seconds>\d+)"

Can anybody confirm that this is the correct way of doing it? 有人可以确认这是正确的做法吗？

EDIT2 : expanding on the rule shown in Edit1, the following does work (sometimes): EDIT2 ：在Edit1中显示的规则上扩展，以下方法确实有效（有时）：

time_format = r"((((?P<hours>\d+)?:?)(?P<minutes>\d+))?:?)(?P<seconds>\d+)"

However, if I say time='1:10' , then this get's translated incorrectly to 1 hour, 1 minute and 0 seconds, instead of 1 minute and 10 seconds. 但是，如果我说time='1:10' ，则此错误地转换为1小时1分0秒，而不是1分10秒。

EDIT3 : this is how I've solved the problem for now, not using regexps. EDIT3 ：这是我现在解决问题的方式，不使用regexps。 I would still love to know how to accomplish the same using regexps. 我仍然很想知道如何使用正则表达式来完成相同的工作。

# defaults
days = 0
hours = 0
minutes = 0
seconds = 0
microseconds = 0

split_fields = time_string.split(':')
nbr_fields = len(split_fields)

if nbr_fields == 0: # should never happen
    pass
if nbr_fields == 1:
    seconds = int(split_fields[0])
elif nbr_fields == 2:
    minutes = int(split_fields[0])
    seconds = int(split_fields[1])
elif nbr_fields == 3:
    hours = int(split_fields[0])
    minutes = int(split_fields[1])
    seconds = int(split_fields[2])
else: # in case there's more than 3 fields ...
    hours = int(split_fields[-3])
    minutes = int(split_fields[-2])
    seconds = int(split_fields[-1])

Answer 1

The part of the regex matching seconds is optional, so it can be matched as you have specified with a following ? 的正则表达式匹配秒部分是可选的，因此它可以为你用下面指定的相匹配? character. 字符。 The same applies to the microseconds field. 微秒字段也是如此。

Taking just the H:M[:S.[USEC]] part of the regex would yield something like this: 仅使用正则表达式的H:M[:S.[USEC]]部分将产生以下内容：

(?P<hours>\d+):(?P<minutes>\d+)(?::(?P<seconds>\d+)(\.(?P<microseconds>\d+))?)?

It's not always necessary to use regex's to do this kind of matching. 不一定总是需要使用正则表达式来进行这种匹配。 Sometimes its easier to write your own parser which splits the elements, eg using string.split(':') . 有时，编写自己的解析器来拆分元素会更容易，例如使用string.split(':') 。 It may be more understandable when you come back to read the code later. 当您稍后再阅读代码时，这可能更容易理解。

(I just noticed you have a colon between seconds and microseconds. The regex listed above would have to change to account for that. The regex listed will match 01:02:03.456. （我刚刚注意到您在秒和微秒之间有一个冒号。上面列出的正则表达式必须更改以解决该问题。列出的正则表达式将匹配01：02：03.456。

edit: 编辑：

It's possible to structure your regex like (S)|(M:S)|(H:M:S) , however this will not work with named groups since the group name cannot appear more than once. 可以像(S)|(M:S)|(H:M:S)那样构造您的正则表达式，但是这不适用于命名组，因为组名不能出现多次。 The problem is that you want the engine to look ahead and match the rightmost token first before matching those to the left. 问题是您希望引擎先向前匹配最右边的标记，然后再匹配左边的标记。 The string will be scanned left-to-right for matches and as a result there is no way to describe the fields in an unambiguous manner, at least not when using named groups. 将从左到右扫描字符串以查找匹配项，因此无法以明确的方式描述字段，至少在使用命名组时不会如此。

Another solution not involving named groups is to use a more general expression such as (\\d+)(:\\d+)?(:\\d+)? 不涉及命名组的另一种解决方案是使用更通用的表达式，例如(\\d+)(:\\d+)?(:\\d+)? and then look at the returned groups that are not None to determine their meaning. 然后查看返回的不是None的组以确定它们的含义。 If there's 1 group, only S present, if 2, M:S etc. 如果有1个群组，则仅存在S，如果是2，则M：S等。

尝试了解和修改时间为[[hh：] mm：] ss格式的正则表达式规则，从右到左匹配

问题描述

1 个解决方案

解决方案1
0 已采纳 2012-08-28 11:14:09

尝试了解和修改时间为[[hh：] mm：] ss格式的正则表达式规则，从右到左匹配

问题描述

1 个解决方案

解决方案1 0 已采纳 2012-08-28 11:14:09

解决方案1
0 已采纳 2012-08-28 11:14:09