简体   繁体   中英

Applying the list function to nested generators in itertools.groupby

How does the list function behave when applied to nested generators? In the following code snippet, I find the behaviour rather puzzling: it seems that list consumes most of the nested generators apart from the last one which still keeps one element:

>>> from itertools import groupby
>>> xs = [1, 2, 2, 3, 3]
>>> for k, g in list(groupby(xs)):
...     print(k, list(g))
1 []
2 []
3 [3]

No, a call to list will not consume a nested iterator/generator.

The behavior is peculiar to itertools.groupby and is described in the docs:

The returned group is itself an iterator that shares the underlying iterable with groupby() . Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible .

[ Emphasis mine ]

If you give a look to the Python source equivalent of itertools.groupby provided in the docs, this becomes more explanatory:

class groupby(object):
    def __init__(self, iterable, key=None):
        if key is None:
            key = lambda x: x
        self.keyfunc = key

        self.it = iter(iterable) # shared iterator

        self.tgtkey = self.currkey = self.currvalue = object()

    def __iter__(self):
        return self

    def next(self):
        while self.currkey == self.tgtkey:
            self.currvalue = next(self.it)    # Exit on StopIteration
            self.currkey = self.keyfunc(self.currvalue)
        self.tgtkey = self.currkey
        return (self.currkey, self._grouper(self.tgtkey))

    def _grouper(self, tgtkey):
        while self.currkey == tgtkey:
            yield self.currvalue
            self.currvalue = next(self.it)    # Exit on StopIteration
            self.currkey = self.keyfunc(self.currvalue)

The last [3] which shows up in your result is self.currvalue (yielded by _grouper ) which was already assigned from the previous call to next on the groupby object.

In order to keep the results of each group, you should store them in a list, and that without consuming the groupby object all at once.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM