简体   繁体   中英

Use with in __iter__

I have to open some text file and read it by line and return only string which contains numbers.

Is it a good idea to use with statement in _iter__ ? Like:

def __iter__(self):
    with open(file_name) as fp:
        for i in fp:
            if is_number(i):
                yield i

Or better way is:

def __enter__(self):
    self._fp = open(self._file, 'r')
    return self

def __exit__(self, exc_type, exc_val, exc_tb):
    self._fp.close()

def __iter__(self) -> int:
    for tracker_id in self._fp:
        if re.search('\d', tracker_id):
            yield int(tracker_id)

You need a generator, rather than a context manager. To create one you could try something like this:

import re

def filter_lines(filename: str, pattern: str):
    p = re.compile(pattern)
    with open(filename) as f:
        for line in f:
            if re.search(p, line):
                yield line

if __name__ == "__main__":
    for line in filter_lines('myfile.txt', '\d'):
        print(line)

Remember to compile your regex patterns if you're going to use them more than once.

I think the second form of the code is better.

The first version is dependent on the iterator returned by __iter__ only existing as long as the iteration is going on. If something happens to break out of the iteration without deallocating the iterator, then the file could be left open indefinitely.

Using it like this is mostly safe, since the object and its iterator will be garbage collected if an exception happens in the loop body, since there are no references to the iterator other than the one held by the for loop itself (though if garbage collection is turned off, it might not be safe on interpreters other than CPython):

for x in Whatever():  # assuming your methods are in a class named Whatever
    # do stuff

This alternative use is probably not be safe, as the iterator will exist in the stack frame that might live on for quite some time as an exception is being handled:

it = iter(Whatever())

for x in it:
    # do stuff

The second form of your code makes it explicit that the calling code is responsible for ensuring the resources get cleaned up properly. You'd call it with something like this, and can be confident that the file will be closed if an exception gets raised:

with Whatever() as w:
    for x in w:
        # do stuff

The main downside of the second version of the code is that you can't iterate on the same object more than once at the same time, since they share the same file object. If somebody wants to iterate twice over the same file, they'll need to create several instances of the class.

The one-use nature of the object might be more natural if it was an iterator itself, rather than just iterable (this is how file objects work, for instance):

class Whatever:
    def __enter__(self):
        self._fp = open(self._file, 'r')
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self._fp.close()

    def __iter__(self):
        return self

    def __next__(self)
        tracker_id = next(self._fp)
        while re.search('\d', tracker_id) is None:
            tracker_id = next(self._fp)
        return int(tracker_id)

Note that we are deliberately not attempting to catch any StopIteration exception that might be raised by calling next on our file, as that will be our signal that we're done too.

In the first case, the file is opened when iteration is requested. That may incur extra I/O if multiple iterations are done. In the second case, the file is always opened when the object is used in a with statement, even if no iteration is done.

There are tradeoffs - one approach might be more efficient depending on how the object is used. If you need to support diverse usage patterns, you may want to combine the approaches. Lazily open the file the first time iteration is requested and then close it in __exit__ . If you don't need that flexibility then choose the option that best fits how the object is likely to be used.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM