简体   繁体   English

使用正则表达式开始和停止

[英]starting and stopping using a regular expression

In my program I use a regular expression until the word break then I use it again until the word stop. 在我的程序中,我使用正则表达式直到单词break,然后再次使用它直到单词stop。 The first part of the program takes the matches and converts it from military time to regular time. 程序的第一部分进行比赛并将其从军事时间转换为正常时间。 The second part divides the military time by a number the user inputs. 第二部分将军事时间除以用户输入的数字。 My code works, but I use my regular expression twice. 我的代码有效,但是我两次使用了正则表达式。 How could change my program so I only use the regular expression once. 如何更改程序,所以我只使用一次正则表达式。

 with open(filename) as text:
        for line in text:
            pattern = re.search(r'((((2)([0-3]))|(([0-1])([0-9])))([0-5])([0-9]))', line)

            if pattern:

            if re.match("BREAK", line):
                break

        for line in text:
            m= re.search(r'((((2)([0-3]))|(([0-1])([0-9])))([0-5])([0-9]))', line)
            if m:

            if re.match("STOP", line):
                break   

Firstly, your regex r'((((2)([0-3]))|(([0-1])([0-9])))([0-5])([0-9]))' has a preposterous number of parentheses in it. 首先,您的正则表达式r'((((2)([0-3]))|(([0-1])([0-9])))([0-5])([0-9]))'带有括号的数字。

Presumably you are not using the capturing groups so created. 大概您没有使用如此创建的捕获组。 You appear to want to match HHMM where HH is 00 to 23 and MM is 00 to 59. 您似乎想匹配HHMM,其中HH为00到23,MM为00到59。

r'(2[0-3]|[01][0-9])[0-5][0-9] will do the same job. r'(2[0-3]|[01][0-9])[0-5][0-9]将完成相同的工作。 You can avoid the one remaining capturing group by doing r'(?:2[0-3]|[01][0-9])[0-5][0-9]' . 您可以通过执行r'(?:2[0-3]|[01][0-9])[0-5][0-9]'来避免剩下一个捕获组。

You may want to avoid spurious matches (eg the "2345" in "blah 23456789") by (eg) having \\b at each end of the pattern. 您可能希望避免伪造的匹配(例如,“ blah 23456789”中的“ 2345”),例如,在模式的每一端都使用\\b

Here's a replacement for your code: 这是您的代码的替代品:

import re
searcher = re.compile(r'\b(?:2[0-3]|[01][0-9])[0-5][0-9]\b').search
with open(filename) as text:
        for line in text:
            m = searcher(line)
            if m:
                do_something_1(line, m)
            if line.startswith("BREAK"): # equivalent to your code; is that what you really mean??
                break
        for line in text:
            m = searcher(line)
            if m:
                do_something_2(line, m)
            if line.startswith("STOP"): # equivalent to your code; is that what you really mean??
                break   

The simplest is to use 最简单的是使用

my_re = re.compile("your regex")
my_re.search(some_string)
my_re.search(some_other_string)

That avoids defining the regex twice. 这样避免了两次定义正则表达式。

Depending on the contents of the document, you could split on 'BREAK' or match multiple, hard to know without seeing an example or more definition. 根据文档的内容,您可以分割成“ BREAK”或匹配多个难以理解的示例,而无需查看示例或更多定义。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM