查找以某个字符开头的行块

Question

Text:文本：

Abcd
Aefg
bhij
Aklm
bnop
Aqrs

(Note, there is no newline after the last line) （注意，最后一行后面没有换行符）

Python code: Python代码：

print(re.findall('(^A.*?$)+',Text,re.MULTILINE))

This returns这返回

['Abcd','Aefg','Aklm','Aqrs']

However, I would like adjacent lines to be returned as one set:但是，我希望将相邻的行作为一组返回：

['Abcd\nAefg','Aklm','Aqrs']

How should I solve this with Python?我应该如何用 Python 解决这个问题？

Answer 1

You may use您可以使用

((?:^A.*[\n\r]?)+)

See a demo on regex101.com .请参阅regex101.com 上的演示。 This is:这是：

(
    (?:^A.*[\n\r]?)+ # original pattern 
                     # with newline characters, optionally
                     # repeat this as often as possible
)

In Python :在Python ：

import re

data = """
Abcd
Aefg
bhij
Aklm
bnop
Aqrs"""

matches = [match.group(1).strip() 
           for match in re.finditer(r'((?:^A.*[\n\r]?)+)', data, re.M)]
print(matches)

Which yields哪个产量

['Abcd\nAefg', 'Aklm', 'Aqrs']

It may lead to catastrophic backtracking eventually because of the nested quantifiers.由于嵌套的量词，它最终可能导致灾难性的回溯。

Answer 2

You may use您可以使用

re.findall(r'^A.*(?:\nA.*)*', text, re.M)

See the regex demo查看正则表达式演示

Details细节

^ - start of string ^ - 字符串的开头
A - an A letter A - A字母
.* - the rest of the line .* - 线的rest
(?:\nA.*)* - zero or more reptitions of (?:\nA.*)* - 零个或多个重复
- \nA - a newline and A \nA - 换行符和A
- .* - the rest of the line. .* - 该系列的 rest。

查找以某个字符开头的行块

问题描述

2 个解决方案

解决方案1
3 已采纳 2020-07-31 11:06:37

解决方案2
1 2020-07-31 11:05:40

查找以某个字符开头的行块

问题描述

2 个解决方案

解决方案1 3 已采纳 2020-07-31 11:06:37

解决方案2 1 2020-07-31 11:05:40

解决方案1
3 已采纳 2020-07-31 11:06:37

解决方案2
1 2020-07-31 11:05:40