I have an issue with re.findall
eg.
text = '[1]xxxxxxxx[2]xxxxxxxx[3]xxxxxx[4]xxxxxxxxxend'
re.findall('(\[\d{1,2}\].*?)(?:\[\d{1,2}\]|end)',text)
what I want is to extract ["[1]xxxxxxxx","[2]xxxxxxxx","[3]xxxxxx","[4]xxxxxxxxx"]
.
However when I did re.findall('(\[\d{1,2}\].*?)(?:\[\d{1,2}\]|end)',text)
I got ['[1]xxxxxxxx', '[3]xxxxxx']
Any luck by this question
The non-capturing group, (?:...)
, does not create a separate memory buffer with the text matched, but it still consumes the text matched, ie it is added to the match value and the regex index is advanced.
You need a non-consuming pattern here, a positive lookahead:
re.findall(r'\[\d{1,2}\].*?(?=\[\d{1,2}\]|end)', text)
See the regex demo .
The (?=\[\d{1,2}\]|end)
pattern matches a ocation that is immediately followed with [
, one or two digits and then ]
, or end
char sequence.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.