Find blocks of lines starting with a certain character

Question

Text:

Abcd
Aefg
bhij
Aklm
bnop
Aqrs

(Note, there is no newline after the last line)

Python code:

print(re.findall('(^A.*?$)+',Text,re.MULTILINE))

This returns

['Abcd','Aefg','Aklm','Aqrs']

However, I would like adjacent lines to be returned as one set:

['Abcd\nAefg','Aklm','Aqrs']

How should I solve this with Python?

Answer 1

You may use

((?:^A.*[\n\r]?)+)

See a demo on regex101.com . This is:

(
    (?:^A.*[\n\r]?)+ # original pattern 
                     # with newline characters, optionally
                     # repeat this as often as possible
)

In Python :

import re

data = """
Abcd
Aefg
bhij
Aklm
bnop
Aqrs"""

matches = [match.group(1).strip() 
           for match in re.finditer(r'((?:^A.*[\n\r]?)+)', data, re.M)]
print(matches)

Which yields

['Abcd\nAefg', 'Aklm', 'Aqrs']

It may lead to catastrophic backtracking eventually because of the nested quantifiers.

Answer 2

You may use

re.findall(r'^A.*(?:\nA.*)*', text, re.M)

See the regex demo

Details

^ - start of string
A - an A letter
.* - the rest of the line
(?:\nA.*)* - zero or more reptitions of
- \nA - a newline and A
- .* - the rest of the line.

Find blocks of lines starting with a certain character

Question

2 answers

solution1
3 ACCPTED 2020-07-31 11:06:37

solution2
1 2020-07-31 11:05:40

Find blocks of lines starting with a certain character

Question

2 answers

solution1 3 ACCPTED 2020-07-31 11:06:37

solution2 1 2020-07-31 11:05:40

solution1
3 ACCPTED 2020-07-31 11:06:37

solution2
1 2020-07-31 11:05:40