检测模式并提取数值以评估它们，并在某些情况下替换其原始 position 中的修改

Question

import re

input_text = "entre las 15 : hs -- 16:10 "  #Example 1
input_text = "entre las 21 :  -- 22"  #Example 2
input_text = "entre la 1 30 -- 2 "  #Example 3
input_text = "entre la 1 09 h.s. -- 6 : hs."  #Example 4
input_text = "entre la 1:50 -- 6 :"  #Example 5
input_text = "entre la 7 59 -- 23 : "  #Example 6
input_text = "entre la 10: -- : 10"  #Example 7

print(repr(input_text)) #print the output string

And this function fix_time_patterns_in_time_intervals() should be something like this, although you may have to use exceptions for possible index errors.这个 function fix_time_patterns_in_time_intervals()应该是这样的，尽管您可能必须使用异常来处理可能的索引错误。 The function should only do the replacements if the hours (the first group) are less than 23, since there is no such thing as a 25th hour in a day. function 仅应在小时数（第一组）小于 23 时进行替换，因为一天中不存在第 25 小时。 And in the case of minutes (the second group) the function should only make the replacements if the minutes are less than 59, since an hour cannot have more than 60 minutes and the 60th minute is already considered 0 and part of the next hour.而在分钟（第二组）的情况下，function 应该只在分钟小于 59 时进行替换，因为一个小时不能超过 60 分钟，第 60 分钟已经被认为是 0 并且是下一小时的一部分。 Due to this limitation, the replacements should only be made under the conditions that the following conditionals pose within the function, otherwise it would only replace the same substring that was extracted from the original string.由于此限制，仅应在 function 中构成以下条件的条件下进行替换，否则它将仅替换从原始字符串中提取的相同 substring。

def fix_time_patterns_in_time_intervals(match_num_time):
    hour_exist = False
    if(int(match_num_time[1]) <= 23):
        #do the replacement process
        if(len(match_num_time[1]) == 1): match_num_time[1] = "0"+ str(match_num_time[1])
        elif(len(match_num_time[1]) == 0): match_num_time[1] = "00"
        hour_exist = True
    if(int(match_num_time[2]) <= 59):
        #do the replacement process
        if(len(match_num_time[2]) == 1): match_num_time[2] = "0"+ str(match_num_time[2])
        elif(len(match_num_time[2]) == 0): match_num_time[2] = "00"
    elif( (int(match_num_time[2]) == None) and (hour_exist == True) ):
        #do the replacement process
        match_num_time[2] = "00"

    return match_num_time #the extracted substring

I think I could use regex capturing group match with re.group() or re.groups() method, and extract the first time mentioned the hours in the input string and then extract the other hour that appears in this string.我想我可以将正则表达式捕获组匹配与re.group()或re.groups()方法一起使用，并提取输入字符串中第一次提到的小时数，然后提取该字符串中出现的另一个小时数。

At the end you should be able to print the original string and object these results( output ) in each of the examples respectively:最后，您应该能够在每个示例中分别打印原始字符串和 object 这些结果（ output ）：

"entre las 15:00 hs -- 16:10 "  #Example 1
"entre las 21:00 -- 22:00"  #Example 2
"entre la 01:30 -- 02:00 "  #Example 3
"entre la 01:09 h.s. -- 06:00 hs."  #Example 4
"entre la 01:50 -- 06:00"  #Example 5
"entre la 07:59 -- 23:00"  #Example 6
"entre la 10:00 -- 00:10"  #Example 7

some additional examples of what time (hours:minutes) conversions should look like:时间（小时：分钟）转换应如下所示的一些其他示例：

"6 :"      --->     "06:00"
"6:"       --->     "06:00"
"6"        --->     "06:00"
": 6"      --->     "00:06"
":6"       --->     "00:06"
": 16"     --->     "00:16"
":16"      --->     "00:16"
" 6"       --->     "06:00"
"15 : 1"   --->     "15:01"
"15 1"     --->     "15:01"
": 15"     --->     "00:15"
"0 15"     --->     "00:15"

I am having problems when extracting values to evaluate within the function fix_time_patterns_in_time_intervals() after identifying them with the regex, I hope you can help me with this.在用正则表达式识别它们后，在 function fix_time_patterns_in_time_intervals()中提取要评估的值时遇到问题，我希望你能帮我解决这个问题。

Answer 1

You can use this regex to match your time values:您可以使用此正则表达式来匹配您的时间值：

(?=[:\d])(?P<hour>\d+)? *:? *(?P<minute>\d+)?(?<! )

This matches:这匹配：

(?=[:\d]) : assert the string starts with a digit or a : - this ensures that we always start by matching the hour group if it is present (?=[:\d]) ：断言字符串以数字或:开头 - 这确保我们始终通过匹配小时组（如果存在）开始
(?P<hour>\d+)? : optional digits captured in the hour group ：在hour组中捕获的可选数字
*:? * *:? * : an optional : surrounded by optional spaces *:? * : an optional :被可选空格包围
(?P<minute>\d+)? : optional digits captured in the minutes group ：在分钟组中捕获的可选数字
(?<! ) : assert the string doesn't end in a space so we don't chew up spaces used for formatting (?<! ) ：断言字符串不以空格结尾，因此我们不会咀嚼用于格式化的空格

Regex demo on regex101 regex101 上的正则表达式演示

You can then use this replacement function to check for the existence of the match groups and (if the values are valid) reformat them with leading 0's as required:然后，您可以使用此替换 function 来检查匹配组的存在，并（如果值有效）根据需要使用前导 0 重新格式化它们：

def fix_time_patterns_in_time_intervals(match_num_time):
    hour = int(match_num_time.group('hour') or '0')
    minute = int(match_num_time.group('minute') or '0')
    if hour > 23 or minute > 59:
        # invalid, don't convert
        return match_num_time.group(0)
    return f'{hour:02d}:{minute:02d}'

For your sample data (with a couple of invalid values):对于您的示例数据（带有几个无效值）：

times = [
    "entre las 15 : hs -- 16:10 ",
    "entre las 21 :  -- 22",
    "entre la 1 30 -- 2 ",
    "entre la 1 09 h.s. -- 6 : hs.",
    "entre la 25 0 -- 12:0",
    "entre las 13 64 -- 5",
    "entre la 1:50 -- 6 :",
    "entre la 7 59 -- 23 : ",
    "entre la 10: -- : 10"
]

regex = re.compile(r'(?=[:\d])(?P<hour>\d+)? *:? *(?P<minute>\d+)?(?<! )')

for time in times:
    print(regex.sub(fix_time_patterns_in_time_intervals, time))

Output: Output：

entre las 15:00 hs -- 16:10
entre las 21:00 -- 22:00
entre la 01:30 -- 02:00
entre la 01:09 h.s. -- 06:00 hs.
entre la 25 0 -- 12:00
entre las 13 64 -- 05:00
entre la 01:50 -- 06:00
entre la 07:59 -- 23:00
entre la 10:00 -- 00:10

检测模式并提取数值以评估它们，并在某些情况下替换其原始 position 中的修改

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-08-29 02:31:48

检测模式并提取数值以评估它们，并在某些情况下替换其原始 position 中的修改

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-08-29 02:31:48

解决方案1
2 已采纳 2022-08-29 02:31:48