[英]How do I use python findall to extract common part?
I have an issue with re.findall我对 re.findall 有疑问
eg.例如。
text = '[1]xxxxxxxx[2]xxxxxxxx[3]xxxxxx[4]xxxxxxxxxend'
re.findall('(\[\d{1,2}\].*?)(?:\[\d{1,2}\]|end)',text)
what I want is to extract ["[1]xxxxxxxx","[2]xxxxxxxx","[3]xxxxxx","[4]xxxxxxxxx"]
.我想要的是提取
["[1]xxxxxxxx","[2]xxxxxxxx","[3]xxxxxx","[4]xxxxxxxxx"]
。
However when I did re.findall('(\[\d{1,2}\].*?)(?:\[\d{1,2}\]|end)',text)
但是当我做
re.findall('(\[\d{1,2}\].*?)(?:\[\d{1,2}\]|end)',text)
I got ['[1]xxxxxxxx', '[3]xxxxxx']
我得到了
['[1]xxxxxxxx', '[3]xxxxxx']
Any luck by this question这个问题运气好
The non-capturing group, (?:...)
, does not create a separate memory buffer with the text matched, but it still consumes the text matched, ie it is added to the match value and the regex index is advanced.非捕获组
(?:...)
不会创建一个单独的 memory 缓冲区与匹配的文本,但它仍然消耗匹配的文本,即它被添加到匹配值并且正则表达式索引是先进的。
You need a non-consuming pattern here, a positive lookahead:你需要一个非消耗模式,一个积极的前瞻:
re.findall(r'\[\d{1,2}\].*?(?=\[\d{1,2}\]|end)', text)
See the regex demo .请参阅正则表达式演示。
The (?=\[\d{1,2}\]|end)
pattern matches a ocation that is immediately followed with [
, one or two digits and then ]
, or end
char sequence. (?=\[\d{1,2}\]|end)
模式匹配紧跟[
,一位或两位数字,然后是]
或end
字符序列的位置。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.