简体   繁体   中英

capture data between a repeating string in python

I want to extract all data between a repeating string

The text goes like this

map report for 0

...................
..............
lot of data in between
.....
......

map report for 1

My regex for this is

map = re.findall(r"map report for(.+?)\S*\W*map", filestring, re.S)

This only returns lines with even numbers after the search string( I presume the odd part is getting included in the preceding even iteration)

any workarounds?

You should consider using split instead of findall for this. It seems to be more what you've got in mind:

re.split(r'map report for \d+\n', str)

This simplifies things greatly in your case.

Your regex consumes the map word from even matches. You need lookahead:

map=re.findall(r"map report for(.+?)\S*\W*(?=map)",filestring,re.S)

This way it checks if your match is followed my map , but it won't be consumed .

The regex I would use would be something like this :

(map report for \d+)(.*?)\1

The \\1 will keep what you captured first and attempt to match it again at the end, so unlike other approach you can have map string in between.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM