简体   繁体   中英

Apart from returning string and iterator in re.findall() and re.finditer() in python do their working also differ?

Wrote the following code so that i get all variable length patterns matching str_key.

line = "ABCDABCDABCDXXXABCDXXABCDABCDABCD"
str_key = "ABCD"
regex = rf"({str_key})+"

find_all_found = re.findall(regex,line)
print(find_all_found)

find_iter_found = re.finditer(regex, line)
for i in find_iter_found:
    print(i.group())

Output i got:

['ABCD', 'ABCD', 'ABCD']
ABCDABCDABCD
ABCD
ABCDABCDABCD

The intended output is last three lines printed by finditer(). I was expecting both functions to give me same output(list or callable does not matter). why it differs in findall() as far i understood from other posts already on stackoverflow, these two functions differ only in their return types and not in matching patterns. Do they work differently, if not what have i done wrong?

You want to access groups rather than group .

>>> find_iter_found = re.finditer(regex, line)
>>> for i in find_iter_found:
...     print(i.groups()[0])

The difference between the two methods is explained here .

The behaviour of the two functions is pretty much the same as far as the matching process is concerned as per:

re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

Changed in version 3.7: Non-empty matches can now start just after a previous empty match.

re.finditer(pattern, string, flags=0)

Return an iterator yielding match objects over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result.

Changed in version 3.7: Non-empty matches can now start just after a previous empty match.

For re.findall change your regex

  • regex = rf"({str_key})+"

into

  • regex = rf"((?:{str_key})+)" .

The quantifier + have to inside the capture group.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM