简体   繁体   中英

Why use regex finditer() rather than findall()

What is the advantage of using finditer() if findall() is good enough? findall() returns all of the matches while finditer() returns match object which can't be processed as directly as a static list.

For example:

import re
CARRIS_REGEX = (r'<th>(\d+)</th><th>([\s\w\.\-]+)</th>'
                r'<th>(\d+:\d+)</th><th>(\d+m)</th>')
pattern = re.compile(CARRIS_REGEX, re.UNICODE)
mailbody = open("test.txt").read()
for match in pattern.finditer(mailbody):
    print(match)
print()
for match in pattern.findall(mailbody):
    print(match)

Output:

<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>
<_sre.SRE_Match object at 0x00A63758>
<_sre.SRE_Match object at 0x00A63F98>

('790', 'PR. REAL', '21:06', '04m')
('758', 'PORTAS BENFICA', '21:10', '09m')
('790', 'PR. REAL', '21:14', '13m')
('758', 'PORTAS BENFICA', '21:21', '19m')
('790', 'PR. REAL', '21:29', '28m')
('758', 'PORTAS BENFICA', '21:38', '36m')
('758', 'SETE RIOS', '21:49', '47m')
('758', 'SETE RIOS', '22:09', '68m')

I ask this out of curiosity.

finditer() returns an iterator while findall() returns an array. An iterator only does work when you ask it to by calling .next() . A for loop knows to call .next() on iterators, meaning if you break from the loop early, any following matches won't be performed. An array, on the other hand, needs to be fully populated, meaning every match must be found up front.

Iterators can be be far more memory and CPU efficient since they only needs to load one item at a time. If you were matching a very large string (encyclopedias can be several hundred megabytes of text), trying to find all matches at once could cause the browser to hang while it searched and potentially run out of memory.

Sometimes it's superfluous to retrieve all matches. If the number of matches is really high you could risk filling up your memory loading them all.

Using iterators or generators is an important concept in modern python. That being said, if you have a small text (eg this web page) the optimization is minuscule.

Here is a related question about iterators: Performance Advantages to Iterators?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM