简体   繁体   中英

python regex - findall not returning output as expected

I am having trouble understanding findall, which says...

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

Why doesn't this basic IP regex work with findall as expected? The matches are not overlapping, and regexpal confirms that pattern is highlighted in re_str.

在此处输入图像描述

Expected: ['1.2.2.3', '123.345.34.3']

Actual: ['2.', '34.']

re_str = r'(\d{1,3}\.){3}\d{1,3}'
line = 'blahblah -- 1.2.2.3 blah 123.345.34.3'
matches = re.findall(re_str, line)
print(matches)    # ['2.', '34.']

This is because capturing groups return only the last match if they're repeated.

Instead, you should make the repeating group non-capturing, and use a non-repeated capture at an outer layer:

re_str = r'((?:\d{1,3}\.){3}\d{1,3})'

Note that for findall , if there is no capturing group, the whole match is automatically selected (like \0 ), so you could drop the outer capture:

re_str = r'(?:\d{1,3}\.){3}\d{1,3}'

When you use parentheses in your regex, re.findall() will return only the parenthesized groups, not the entire matched string. Put a ?: after the ( to tell it not to use the parentheses to extract a group, and then the results should be the entire matched string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM