Text:
Abcd
Aefg
bhij
Aklm
bnop
Aqrs
(Note, there is no newline after the last line)
Python code:
print(re.findall('(^A.*?$)+',Text,re.MULTILINE))
This returns
['Abcd','Aefg','Aklm','Aqrs']
However, I would like adjacent lines to be returned as one set:
['Abcd\nAefg','Aklm','Aqrs']
How should I solve this with Python?
You may use
((?:^A.*[\n\r]?)+)
See a demo on regex101.com . This is:
(
(?:^A.*[\n\r]?)+ # original pattern
# with newline characters, optionally
# repeat this as often as possible
)
In Python
:
import re
data = """
Abcd
Aefg
bhij
Aklm
bnop
Aqrs"""
matches = [match.group(1).strip()
for match in re.finditer(r'((?:^A.*[\n\r]?)+)', data, re.M)]
print(matches)
Which yields
['Abcd\nAefg', 'Aklm', 'Aqrs']
It may lead to catastrophic backtracking eventually because of the nested quantifiers.
You may use
re.findall(r'^A.*(?:\nA.*)*', text, re.M)
See the regex demo
Details
^
- start of string A
- an A
letter .*
- the rest of the line (?:\nA.*)*
- zero or more reptitions of
\nA
- a newline and A
.*
- the rest of the line.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.