[英]Find blocks of lines starting with a certain character
Text:文本:
Abcd
Aefg
bhij
Aklm
bnop
Aqrs
(Note, there is no newline after the last line) (注意,最后一行后面没有换行符)
Python code: Python代码:
print(re.findall('(^A.*?$)+',Text,re.MULTILINE))
This returns这返回
['Abcd','Aefg','Aklm','Aqrs']
However, I would like adjacent lines to be returned as one set:但是,我希望将相邻的行作为一组返回:
['Abcd\nAefg','Aklm','Aqrs']
How should I solve this with Python?我应该如何用 Python 解决这个问题?
You may use您可以使用
((?:^A.*[\n\r]?)+)
See a demo on regex101.com .请参阅regex101.com 上的演示。 This is:这是:
(
(?:^A.*[\n\r]?)+ # original pattern
# with newline characters, optionally
# repeat this as often as possible
)
In Python
:在Python
:
import re
data = """
Abcd
Aefg
bhij
Aklm
bnop
Aqrs"""
matches = [match.group(1).strip()
for match in re.finditer(r'((?:^A.*[\n\r]?)+)', data, re.M)]
print(matches)
Which yields哪个产量
['Abcd\nAefg', 'Aklm', 'Aqrs']
It may lead to catastrophic backtracking eventually because of the nested quantifiers.由于嵌套的量词,它最终可能导致灾难性的回溯。
You may use您可以使用
re.findall(r'^A.*(?:\nA.*)*', text, re.M)
See the regex demo查看正则表达式演示
Details细节
^
- start of string ^
- 字符串的开头A
- an A
letter A
- A
字母.*
- the rest of the line .*
- 线的rest(?:\nA.*)*
- zero or more reptitions of (?:\nA.*)*
- 零个或多个重复
\nA
- a newline and A
\nA
- 换行符和A
.*
- the rest of the line. .*
- 该系列的 rest。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.