[英]trying to understand and modify regexp rule for time in format [[hh:]mm:]ss, matching from right to left
I'm trying to understand the following regexp rule: 我试图了解以下正则表达式规则:
import re
time_format = r"(?:(?P<weeks>\d+)\W*(?:weeks?|w),?)?\W*(?:(?P<days>\d+)\W*(?:days?|d),?)?\W*(?:(?P<hours>\d+):(?P<minutes>\d+)(?::(?P<seconds>\d+)(?:\.(?P<microseconds>\d+))?)?)?"
time_matcher = re.compile(time_format)
time_matches = time_matcher.match(td_str)
With this rule, if I set td_str = '0:10'
I get the following result: 使用此规则,如果我将td_str = '0:10'
设置td_str = '0:10'
,则会得到以下结果:
{'days': None,
'hours': '0',
'microseconds': None,
'minutes': '01',
'seconds': None,
'weeks': None}
If I set td_str = '0:0:10'
I get the following result: 如果我将td_str = '0:0:10'
设置td_str = '0:0:10'
得到以下结果:
{'days': None,
'hours': '0',
'microseconds': None,
'minutes': '0',
'seconds': '01',
'weeks': None}
How do I have to change the regexp rule, so that 0:10
will be interpreted as 0 minutes + 10 seconds? 我如何更改正则表达式规则,以便将0:10
解释为0分钟+ 10秒? Additionally, '1:20:1' should be interpreted as 1 hour + 20 minutes + 1 second. 此外,“ 1:20:1”应解释为1小时+ 20分钟+ 1秒。
So the regexp rule that I want to create (as far as I understand regexps) is: [H:[M:]]S
因此,据我了解,我想创建的正则表达式规则是: [H:[M:]]S
EDIT1 : I believe I've constructed a correct rule for [M:]S
: EDIT1 :我相信我为[M:]S
建立了正确的规则:
time_format = r"((?P<minutes>\d+)?:?)(?P<seconds>\d+)"
Can anybody confirm that this is the correct way of doing it? 有人可以确认这是正确的做法吗?
EDIT2 : expanding on the rule shown in Edit1, the following does work (sometimes): EDIT2 :在Edit1中显示的规则上扩展,以下方法确实有效(有时):
time_format = r"((((?P<hours>\d+)?:?)(?P<minutes>\d+))?:?)(?P<seconds>\d+)"
However, if I say time='1:10'
, then this get's translated incorrectly to 1 hour, 1 minute and 0 seconds, instead of 1 minute and 10 seconds. 但是,如果我说time='1:10'
,则此错误地转换为1小时1分0秒,而不是1分10秒。
EDIT3 : this is how I've solved the problem for now, not using regexps. EDIT3 :这是我现在解决问题的方式,不使用regexps。 I would still love to know how to accomplish the same using regexps. 我仍然很想知道如何使用正则表达式来完成相同的工作。
# defaults
days = 0
hours = 0
minutes = 0
seconds = 0
microseconds = 0
split_fields = time_string.split(':')
nbr_fields = len(split_fields)
if nbr_fields == 0: # should never happen
pass
if nbr_fields == 1:
seconds = int(split_fields[0])
elif nbr_fields == 2:
minutes = int(split_fields[0])
seconds = int(split_fields[1])
elif nbr_fields == 3:
hours = int(split_fields[0])
minutes = int(split_fields[1])
seconds = int(split_fields[2])
else: # in case there's more than 3 fields ...
hours = int(split_fields[-3])
minutes = int(split_fields[-2])
seconds = int(split_fields[-1])
The part of the regex matching seconds is optional, so it can be matched as you have specified with a following ?
的正则表达式匹配秒部分是可选的,因此它可以为你用下面指定的相匹配?
character. 字符。 The same applies to the microseconds field. 微秒字段也是如此。
Taking just the H:M[:S.[USEC]]
part of the regex would yield something like this: 仅使用正则表达式的H:M[:S.[USEC]]
部分将产生以下内容:
(?P<hours>\d+):(?P<minutes>\d+)(?::(?P<seconds>\d+)(\.(?P<microseconds>\d+))?)?
It's not always necessary to use regex's to do this kind of matching. 不一定总是需要使用正则表达式来进行这种匹配。 Sometimes its easier to write your own parser which splits the elements, eg using string.split(':')
. 有时,编写自己的解析器来拆分元素会更容易,例如使用string.split(':')
。 It may be more understandable when you come back to read the code later. 当您稍后再阅读代码时,这可能更容易理解。
(I just noticed you have a colon between seconds and microseconds. The regex listed above would have to change to account for that. The regex listed will match 01:02:03.456. (我刚刚注意到您在秒和微秒之间有一个冒号。上面列出的正则表达式必须更改以解决该问题。列出的正则表达式将匹配01:02:03.456。
edit: 编辑:
It's possible to structure your regex like (S)|(M:S)|(H:M:S)
, however this will not work with named groups since the group name cannot appear more than once. 可以像(S)|(M:S)|(H:M:S)
那样构造您的正则表达式,但是这不适用于命名组,因为组名不能出现多次。 The problem is that you want the engine to look ahead and match the rightmost token first before matching those to the left. 问题是您希望引擎先向前匹配最右边的标记,然后再匹配左边的标记。 The string will be scanned left-to-right for matches and as a result there is no way to describe the fields in an unambiguous manner, at least not when using named groups. 将从左到右扫描字符串以查找匹配项,因此无法以明确的方式描述字段,至少在使用命名组时不会如此。
Another solution not involving named groups is to use a more general expression such as (\\d+)(:\\d+)?(:\\d+)?
不涉及命名组的另一种解决方案是使用更通用的表达式,例如(\\d+)(:\\d+)?(:\\d+)?
and then look at the returned groups that are not None to determine their meaning. 然后查看返回的不是None的组以确定它们的含义。 If there's 1 group, only S present, if 2, M:S etc. 如果有1个群组,则仅存在S,如果是2,则M:S等。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.