简体   繁体   English

正则表达式查找确切的X行

[英]Regex to find exactly X lines

I'm trying to run some regex in python to drop different patterns of text into different files. 我正在尝试在python中运行一些正则表达式,以将不同模式的文本放入不同的文件中。 Turns out, 99+% of all the lines from my source file have a 3-line format like this: 事实证明,源文件中所有行的99%以上具有3行格式,如下所示:

12340987  some other text
          some text
          some text

But then I've got small likelihood that the pattern will have four lines, like this: 但是然后我很少会看到模式有四行,如下所示:

123456789   Some text
            Some text
            some text
            one extra line of text

I was trying to write a regex to chase down all the 4-line patterns, and started out with this: 我试图编写一个正则表达式来追踪所有的4行模式,并以此开始:

^[0-9]+([\s\S]*?)(?=^[0-9])

How can I build something with a gist like this, but only grab the 4-line pattern? 我该如何用这样的要领来构建东西,但只能抓住4线模式? Thanks for reading, and helping if you can. 感谢您的阅读和帮助。 :) :)

You could try something like this: 您可以尝试这样的事情:

^[0-9]+.+$\s(?:^(?!\d).+$\s?){3}

flags gm set 标志gm

see here https://regex101.com/r/TOoCzF/1 看到这里https://regex101.com/r/TOoCzF/1

Explanation: ^[0-9]+.+$\\s = Start of line, follows by number and then something, end of line and linebreak 说明: ^[0-9]+.+$\\s =行首,后跟数字,然后是东西,行尾和换行符

then (?:^(?!\\d).+$\\s?){3} = 3 times a line that does not start with a number 然后(?:^(?!\\d).+$\\s?){3} = 3倍不是以数字开头的行

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM