如何使用正则表达式查找所有匹配项，其中部分匹配项重叠

Question

I have a long.txt file.我有一个 long.txt 文件。 I want to find all the matching results with regex.我想用正则表达式找到所有匹配的结果。

for example:例如：

test_str = 'ali. veli. ahmet.'
src = re.finditer(r'(\w+\.\s){1,2}', test_str, re.MULTILINE)
print(*src)

this code returns:此代码返回：

<re.Match object; span=(0, 11), match='ali. veli. '>

i need;我需要;

['ali. veli', 'veli. ahmet.']

how can i do that with regex?我怎么能用正则表达式做到这一点？

Answer 1

The (\w+\.\s){1,2} pattern contains a repeated capturing group , and Python re does not store all the captures it finds, it only saves the last one into the group memory buffer. (\w+\.\s){1,2}模式包含一个重复的捕获组，并且 Python re不存储它找到的所有捕获，它只将最后一个保存到组 memory 缓冲区中。 At any rate, you do not need the repeated capturing group because you need to extract multiple occurrences of the pattern from a string, and re.finditer or re.findall will do that for you.无论如何，您不需要重复捕获组，因为您需要从字符串中提取多次出现的模式，而re.finditer或re.findall将为您完成。

Also, the re.MULTILINE flag is not necessar here since there are no ^ or $ anchors in the pattern.此外，这里不需要re.MULTILINE标志，因为模式中没有^或$锚点。

You may get the expected results using您可能会得到预期的结果使用

import re
test_str = 'ali. veli. ahmet.'
src = re.findall(r'(?=\b(\w+\.\s+\w+))', test_str)
print(src)
# => ['ali. veli', 'veli. ahmet']

See the Python demo请参阅Python 演示

The pattern means图案的意思

(?= - start of a positive lookahead (?= - 积极前瞻的开始
- \b - a word boundary (crucial here, it is necessary to only start capturing at word boundaries) \b - 一个单词边界（这里很重要，只需要从单词边界开始捕获）
- (\w+\.\s+\w+) - Capturing group 1: 1+ word chars, . (\w+\.\s+\w+) - 捕获组 1：1+ 字字符， . , 1+ whitespaces and 1+ word chars , 1+ 空格和 1+ 单词字符
) - end of the lookahead. ) - 前瞻结束。

如何使用正则表达式查找所有匹配项，其中部分匹配项重叠

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-05-16 22:34:26

如何使用正则表达式查找所有匹配项，其中部分匹配项重叠

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-05-16 22:34:26

解决方案1
3 已采纳 2020-05-16 22:34:26