如何使用 python findall 提取公共部分？

Question

I have an issue with re.findall我对 re.findall 有疑问

eg.例如。

text = '[1]xxxxxxxx[2]xxxxxxxx[3]xxxxxx[4]xxxxxxxxxend'
re.findall('(\[\d{1,2}\].*?)(?:\[\d{1,2}\]|end)',text)

what I want is to extract ["[1]xxxxxxxx","[2]xxxxxxxx","[3]xxxxxx","[4]xxxxxxxxx"] .我想要的是提取["[1]xxxxxxxx","[2]xxxxxxxx","[3]xxxxxx","[4]xxxxxxxxx"] 。

However when I did re.findall('(\[\d{1,2}\].*?)(?:\[\d{1,2}\]|end)',text)但是当我做re.findall('(\[\d{1,2}\].*?)(?:\[\d{1,2}\]|end)',text)

I got ['[1]xxxxxxxx', '[3]xxxxxx']我得到了['[1]xxxxxxxx', '[3]xxxxxx']

Any luck by this question这个问题运气好

Answer 1

The non-capturing group, (?:...) , does not create a separate memory buffer with the text matched, but it still consumes the text matched, ie it is added to the match value and the regex index is advanced.非捕获组(?:...)不会创建一个单独的 memory 缓冲区与匹配的文本，但它仍然消耗匹配的文本，即它被添加到匹配值并且正则表达式索引是先进的。

You need a non-consuming pattern here, a positive lookahead:你需要一个非消耗模式，一个积极的前瞻：

re.findall(r'\[\d{1,2}\].*?(?=\[\d{1,2}\]|end)', text)

See the regex demo .请参阅正则表达式演示。

The (?=\[\d{1,2}\]|end) pattern matches a ocation that is immediately followed with [ , one or two digits and then ] , or end char sequence. (?=\[\d{1,2}\]|end)模式匹配紧跟[ ，一位或两位数字，然后是]或end字符序列的位置。

如何使用 python findall 提取公共部分？

问题描述

1 个解决方案

解决方案1
1 2020-08-12 20:01:01

如何使用 python findall 提取公共部分？

问题描述

1 个解决方案

解决方案1 1 2020-08-12 20:01:01

解决方案1
1 2020-08-12 20:01:01