[英]python 3 regex - find all overlapping matches' start and end index in a string
This was my original approach: 这是我原来的方法:
string = '1'*15
result = re.finditer(r'(?=11111)', string) # overlapped = True
# Doesn't work for me
for i in result: # python 3.5
print(i.start(), i.end())
It finds all overlapping matches, but fails to get the right end index. 它找到所有重叠匹配,但无法获得正确的结束索引。 The output:
输出:
1 <_sre.SRE_Match object; span=(0, 0), match=''>
2 <_sre.SRE_Match object; span=(1, 1), match=''>
3 <_sre.SRE_Match object; span=(2, 2), match=''>
4 <_sre.SRE_Match object; span=(3, 3), match=''>
(and so on..)
My Question: How can I find all overlapping matches, and get all the start and end index right as well? 我的问题:如何找到所有重叠的匹配,并获得所有开始和结束索引?
The problem you get is related to the fact that a lookahead is a zero-width assertion that consumes (ie adds to the match result) no text. 您得到的问题与前瞻是零宽度断言的事实有关,该断言消耗(即添加匹配结果)没有文本。 It is a mere position in the string.
它只是字符串中的一个位置。 Thus, all your matches start and end at the same location in the string.
因此,所有匹配都在字符串中的相同位置开始和结束。
You need to enclose the lookahead pattern with a capturing group (ie (?=(11111))
) and access start and end of group 1 (with i.start(1)
and i.end(1)
): 您需要将前瞻模式与捕获组 (即
(?=(11111))
)和访问组1的开始和结束(使用i.start(1)
和i.end(1)
) i.end(1)
:
import re
s = '1'*15
result = re.finditer(r'(?=(11111))', s)
for i in result:
print(i.start(1), i.end(1))
See the Python demo , its output is 查看Python演示 ,其输出是
(0, 5)
(1, 6)
(2, 7)
(3, 8)
(4, 9)
(5, 10)
(6, 11)
(7, 12)
(8, 13)
(9, 14)
(10, 15)
Can you compare with this implementation and see where the differences might be. 你能否与这个实现进行比较,看看差异可能在哪里。
match = re.finditer(r'111','test111 end111 and another 111')
for i in match:
print(i.start(),i.end()
If this is not working for you kindly share a sample of you data 如果这不适合您,请分享您的数据样本
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.