简体   繁体   中英

Python - Regex findall repeated pattern followed by variable length of chars

I have the following pattern:
1MHG161 xxxxxxxxxxxxx 1MHG161 xxx
where xxxx is variable length of chars & spaces.

I am trying to capture each one and have the following expected output:
[ '1MHG161 xxxxxxxxxxxxx ' , '1MHG161 xxx' ]

I have tried a lot of combination this is the last one

messages_strings = re.findall("(1MHG161.+?)(?=1MHG161)",content)

This finds all except the last one.


Edit 1:

I have taken @anubhava answer, a little bit further to solve the same problem but with dynamic delimiters by using \\d[AZ]{3}\\d{3} instead of 1MHG161

This may help people working with EDI parsers.

You can use:

>>> re.findall(r"(1MHG161.+?)(?=1MHG161|$)", content)
['1MHG161  xxxxxxxxxxxxx  ', '1MHG161 xxx']

Lookahead (?=1MHG161|$) will match 1MHG161 or end of line anchor $ after your match.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM