简体   繁体   English

正则表达式:恰好匹配三行

[英]regex: match exactly three lines

I want to match the following input. 我想匹配以下输入。 How would I match a group a certain number of times without using a multiline string? 如何在不使用多行字符串的情况下将组匹配一定次数? Something like (^(\\d+) (.+)$){3}) (but that doesn't work). 类似于(^(\\ d +)(。+)$){3})(但这不起作用)。

sample_string = """Breakpoint 12 reached 
         90  good morning
     91  this is cool
     92  this is bananas
     """
pattern_for_continue = re.compile("""Breakpoint \s (\d+) \s reached \s (.+)$
                                 ^(\d+)\s+  (.+)\n
                                 ^(\d+)\s+  (.+)\n
                                 ^(\d+)\s+  (.+)\n
                                  """, re.M|re.VERBOSE)
matchobj = pattern_for_continue.match(sample_string)
    print matchobj.group(0)

There are a series of problems with your expression and sample: 您的表达式和示例存在一系列问题:

  • Your use of VERBOSE makes all spaces not match, so your spaces around the digits on the first line are ignored too. 使用VERBOSE会使所有空格不匹配,因此第一行数字周围的空格也会被忽略。 Replace spaces with \\s or [ ] (the latter only matches a literal space, the former matches newlines and tabs too). \\s[ ]替换空格(后者只匹配文字空格,前者也匹配换行符和制表符)。

  • Your input sample has whitespace before the digit on each line but your example pattern requires that the digits are at the start of the line. 您的输入样本在每行数字前面都有空格,但您的示例模式要求数字位于行的开头。 Either allow for that whitespace or fix your sample input. 允许该空格或修复您的示例输入。

  • The biggest problem is that a capturing group inside a repeating group (so (\\d+) inside of a larger group with {3} at the end) only captures the last match. 最大的问题是重复组内的捕获组(所以(\\d+)在一个较大的组内,最后只有{3} )只捕获最后一个匹配。 You'll get 92 and this is bananas , not the previous two matched lines. 你会得到92this is bananas ,而不是前两个匹配的线。

To overcome all that, you have to repeat that pattern for the three lines explicitly. 为了克服这一切,你必须明确地重复三行的模式。 You could use Python to implement that repetition: 您可以使用Python来实现重复:

linepattern =  r'[ ]* (\d+) [ ]+ ([^\n]+)\n'

pattern_for_continue = re.compile(r"""
    Breakpoint [ ]+ (\d+) [ ]+ reached [ ]+ ([^\n]*?)\n
    {}
""".format(linepattern * 3), re.MULTILINE|re.VERBOSE)

Which, for your sample input, returns: 对于您的样本输入,返回:

>>> pattern_for_continue.match(sample_string).groups()
('12', '', '90', 'hey this is a great line', '91', 'this is cool too', '92', 'this is bananas')

If you really do not want to match spaces before the digits on the 3 extra lines, you can remove the first [ ]* pattern from linepattern . 如果您真的不想在3个额外行上的数字之前匹配空格,则可以从linepattern删除第一个[ ]*模式。

Code

You need something more like this: 你需要更像这样的东西:

import re

sample_string = """Breakpoint 12 reached 
90  hey this is a great line
91  this is cool too
92  this is bananas
"""
pattern_for_continue = re.compile(r"""
    Breakpoint\s+(\d+)\s+reached\s+\n
    (\d+)  ([^\n]+?)\n
    (\d+)  ([^\n]+?)\n
    (\d+)  ([^\n]+?)\n
""", re.MULTILINE|re.VERBOSE)
matchobj = pattern_for_continue.match(sample_string)

for i in range(1, 8):
    print i, matchobj.group(i)
print "Entire match:"
print matchobj.group(0)

Result 结果

1 12
2 90
3   hey this is a great line
4 91
5   this is cool too
6 92
7   this is bananas
Entire match:
0 Breakpoint 12 reached 
90  hey this is a great line
91  this is cool too
92  this is bananas

Reasons 原因

  • re.VERBOSE makes explicit whitespace necessary in your regex. re.VERBOSE在你的正则表达式中提供了必要的显式空格。 I partially fixed this by left-justifying your data in the multiline string. 我通过在多行字符串中左对齐数据来部分修复此问题。 I think this is justified because you probably don't have this in real code; 我认为这是合理的,因为你可能没有真正的代码; it's likely an artifact of editing in a multiline string. 它可能是多行字符串中的编辑工件。

  • you need to replace $ with \\n . 你需要用\\n替换$

  • you need non-greedy matches 你需要非贪婪的比赛

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM