简体   繁体   English

Python正则表达式搜索行以冒号结尾,所有文本以冒号结尾,直到下一行结尾

[英]Python regex search lines that end with a colon and all text after until next line that ends with colon

I have the following text: 我有以下文字:

Test 123:

This is a blue car

Test:

This car is not blue

This car is yellow

Hello:

This is not a test

I want to put together a regex that finds all items that start with a Test or a Hello and precede a colon, and optionally a tree digit number, and return all content after that until the next line that fits that same description. 我想整理一个正则表达式,它找到以TestHello开头的所有项目,并在冒号前面,可选择一个树数字编号,然后返回所有内容,直到符合相同描述的下一行。 So for above text, the findall regex would return an array of: 所以对于上面的文本,findall正则表达式将返回一个数组:

[("Test", "123", "\nThis is a blue car\n"),
 ("Test", "", "\nThis car is not blue\n\nThis car is yellow\n"),
 ("Hello", "", "\nThis is not a test")]

So far I got this: 到目前为止我得到了这个:

r = re.findall(r'^(Test|Hello) *([^:]*):$', test, re.MULTILINE)

It matches each line according to the description but I'm unsure how to capture the content until the next line that ends with a colon. 它根据描述匹配每一行,但我不确定如何捕获内容,直到下一行以冒号结束。 Any ideas? 有任何想法吗?

You could use the below regex which uses DOTALL modifier, 您可以使用以下使用DOTALL修饰符的正则表达式,

(?:^|\n)(Test|Hello) *([^:]*):\n(.*?)(?=\n(?:Test|Hello)|$)

DEMO DEMO

>>> import re
>>> s = """Test 123:
... 
... This is a blue car
... 
... Test:
... 
... This car is not blue
... 
... This car is yellow
... 
... Hello:
... 
... This is not a test"""
>>> re.findall(r'(?s)(?:^|\n)(Test|Hello) *([^:]*):\n(.*?)(?=\n(?:Test|Hello)|$)', s)
[('Test', '123', '\nThis is a blue car\n'), ('Test', '', '\nThis car is not blue\n\nThis car is yellow\n'), ('Hello', '', '\nThis is not a test')]
import re
p = re.compile(ur'(Test|Hello)\s*([^:]*):\n(\n.*?)(?=Test[^:]*:|Hello[^:]*:|$)', re.DOTALL | re.IGNORECASE)
test_str = u"Test 123:\n\nThis is a blue car\n\nTest:\n\nThis car is not blue\n\nThis car is yellow\n\nHello:\n\nThis is not a test"

re.findall(p, test_str)

You can try this.See demo. 你可以试试这个。看看演示。

http://regex101.com/r/eM1xP0/1 http://regex101.com/r/eM1xP0/1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM