简体   繁体   English

检测模式并提取数值以评估它们,并在某些情况下替换其原始 position 中的修改

[英]Detect a pattern and extract numeric values to then evaluate them and in certain cases replace their modifications in their original position

import re

input_text = "entre las 15 : hs -- 16:10 "  #Example 1
input_text = "entre las 21 :  -- 22"  #Example 2
input_text = "entre la 1 30 -- 2 "  #Example 3
input_text = "entre la 1 09 h.s. -- 6 : hs."  #Example 4
input_text = "entre la 1:50 -- 6 :"  #Example 5
input_text = "entre la 7 59 -- 23 : "  #Example 6
input_text = "entre la 10: -- : 10"  #Example 7
print(repr(input_text)) #print the output string

And this function fix_time_patterns_in_time_intervals() should be something like this, although you may have to use exceptions for possible index errors.这个 function fix_time_patterns_in_time_intervals()应该是这样的,尽管您可能必须使用异常来处理可能的索引错误。 The function should only do the replacements if the hours (the first group) are less than 23, since there is no such thing as a 25th hour in a day. function 仅应在小时数(第一组)小于 23 时进行替换,因为一天中不存在第 25 小时。 And in the case of minutes (the second group) the function should only make the replacements if the minutes are less than 59, since an hour cannot have more than 60 minutes and the 60th minute is already considered 0 and part of the next hour.而在分钟(第二组)的情况下,function 应该只在分钟小于 59 时进行替换,因为一个小时不能超过 60 分钟,第 60 分钟已经被认为是 0 并且是下一小时的一部分。 Due to this limitation, the replacements should only be made under the conditions that the following conditionals pose within the function, otherwise it would only replace the same substring that was extracted from the original string.由于此限制,仅应在 function 中构成以下条件的条件下进行替换,否则它将仅替换从原始字符串中提取的相同 substring。

def fix_time_patterns_in_time_intervals(match_num_time):
    hour_exist = False
    if(int(match_num_time[1]) <= 23):
        #do the replacement process
        if(len(match_num_time[1]) == 1): match_num_time[1] = "0"+ str(match_num_time[1])
        elif(len(match_num_time[1]) == 0): match_num_time[1] = "00"
        hour_exist = True
    if(int(match_num_time[2]) <= 59):
        #do the replacement process
        if(len(match_num_time[2]) == 1): match_num_time[2] = "0"+ str(match_num_time[2])
        elif(len(match_num_time[2]) == 0): match_num_time[2] = "00"
    elif( (int(match_num_time[2]) == None) and (hour_exist == True) ):
        #do the replacement process
        match_num_time[2] = "00"

    return match_num_time #the extracted substring

I think I could use regex capturing group match with re.group() or re.groups() method, and extract the first time mentioned the hours in the input string and then extract the other hour that appears in this string.我想我可以将正则表达式捕获组匹配与re.group()re.groups()方法一起使用,并提取输入字符串中第一次提到的小时数,然后提取该字符串中出现的另一个小时数。

At the end you should be able to print the original string and object these results( output ) in each of the examples respectively:最后,您应该能够在每个示例中分别打印原始字符串和 object 这些结果( output ):

"entre las 15:00 hs -- 16:10 "  #Example 1
"entre las 21:00 -- 22:00"  #Example 2
"entre la 01:30 -- 02:00 "  #Example 3
"entre la 01:09 h.s. -- 06:00 hs."  #Example 4
"entre la 01:50 -- 06:00"  #Example 5
"entre la 07:59 -- 23:00"  #Example 6
"entre la 10:00 -- 00:10"  #Example 7

some additional examples of what time (hours:minutes) conversions should look like:时间(小时:分钟)转换应如下所示的一些其他示例:

"6 :"      --->     "06:00"
"6:"       --->     "06:00"
"6"        --->     "06:00"
": 6"      --->     "00:06"
":6"       --->     "00:06"
": 16"     --->     "00:16"
":16"      --->     "00:16"
" 6"       --->     "06:00"
"15 : 1"   --->     "15:01"
"15 1"     --->     "15:01"
": 15"     --->     "00:15"
"0 15"     --->     "00:15"

I am having problems when extracting values to evaluate within the function fix_time_patterns_in_time_intervals() after identifying them with the regex, I hope you can help me with this.在用正则表达式识别它们后,在 function fix_time_patterns_in_time_intervals()中提取要评估的值时遇到问题,我希望你能帮我解决这个问题。

You can use this regex to match your time values:您可以使用此正则表达式来匹配您的时间值:

(?=[:\d])(?P<hour>\d+)? *:? *(?P<minute>\d+)?(?<! )

This matches:这匹配:

  • (?=[:\d]) : assert the string starts with a digit or a : - this ensures that we always start by matching the hour group if it is present (?=[:\d]) :断言字符串以数字或:开头 - 这确保我们始终通过匹配小时组(如果存在)开始
  • (?P<hour>\d+)? : optional digits captured in the hour group :在hour组中捕获的可选数字
  • *:? * *:? * : an optional : surrounded by optional spaces *:? * : an optional :被可选空格包围
  • (?P<minute>\d+)? : optional digits captured in the minutes group :在分钟组中捕获的可选数字
  • (?<! ) : assert the string doesn't end in a space so we don't chew up spaces used for formatting (?<! ) :断言字符串不以空格结尾,因此我们不会咀嚼用于格式化的空格

Regex demo on regex101 regex101 上的正则表达式演示

You can then use this replacement function to check for the existence of the match groups and (if the values are valid) reformat them with leading 0's as required:然后,您可以使用此替换 function 来检查匹配组的存在,并(如果值有效)根据需要使用前导 0 重新格式化它们:

def fix_time_patterns_in_time_intervals(match_num_time):
    hour = int(match_num_time.group('hour') or '0')
    minute = int(match_num_time.group('minute') or '0')
    if hour > 23 or minute > 59:
        # invalid, don't convert
        return match_num_time.group(0)
    return f'{hour:02d}:{minute:02d}'

For your sample data (with a couple of invalid values):对于您的示例数据(带有几个无效值):

times = [
    "entre las 15 : hs -- 16:10 ",
    "entre las 21 :  -- 22",
    "entre la 1 30 -- 2 ",
    "entre la 1 09 h.s. -- 6 : hs.",
    "entre la 25 0 -- 12:0",
    "entre las 13 64 -- 5",
    "entre la 1:50 -- 6 :",
    "entre la 7 59 -- 23 : ",
    "entre la 10: -- : 10"
]

regex = re.compile(r'(?=[:\d])(?P<hour>\d+)? *:? *(?P<minute>\d+)?(?<! )')

for time in times:
    print(regex.sub(fix_time_patterns_in_time_intervals, time))

Output: Output:

entre las 15:00 hs -- 16:10
entre las 21:00 -- 22:00
entre la 01:30 -- 02:00
entre la 01:09 h.s. -- 06:00 hs.
entre la 25 0 -- 12:00
entre las 13 64 -- 05:00
entre la 01:50 -- 06:00
entre la 07:59 -- 23:00
entre la 10:00 -- 00:10

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 检测字符串中的模式,提取出现该模式的所有子字符串,然后对其进行修改并将其替换为其原始 position - Detect a pattern within a string, extract all the substrings in which that pattern occurs and then modify it and replace it in its original position 根据模式替换某些值并在 pandas 中提取 substring - Replace certain values based on pattern and extract substring in pandas “AttributeError: 're.Match'” 尝试使用正则表达式模式提取子字符串,稍后修改它们并将它们放回原来的 position - "AttributeError: 're.Match'" when trying to extract substrings with regex pattern, to later modify them and place them back in their original position 按照特定模式提取字符串并存储它们 - extract strings after a certain pattern and store them Pandas - 用特定模式替换值 - Pandas - replace values with a certain pattern Pandas - 使用替换+正则表达式从字符串列中提取数值 - Pandas - extract numeric values from string column using replace + regex 如何提取与特定模式匹配的字符串的一部分,但对于一行中的所有情况,并使用 Pandas 用逗号分隔它们 - How to extract a part of string match with specific pattern but for all cases in a row and separate them by comma using pandas 设置 substring 的开始和/或结束正则表达式模式,以在某些需要的情况下将其替换为另一个 - Set a start and/or end regex pattern of a substring to replace it with another in certain desired cases 仅替换数值 - Replace only numeric values 日志级别的数值有哪些用例? - What are the use cases for numeric values of log levels?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM