简体   繁体   中英

python generator of generators?

I wrote a class that reads a txt file. The file is composed of blocks of non-empty lines (let's call them "sections"), separated by an empty line:

line1.1
line1.2
line1.3

line2.1
line2.2

My first implementation was to read the whole file and return a list of lists, that is a list of sections, where each section is a list of lines. This was obviously terrible memory-wise.

So I re-implemented it as a generator of lists, that is at every cycle my class reads a whole section in memory as a list and yields it.

This is better, but it's still problematic in case of large sections. So I wonder if I can reimplement it as a generator of generators? The problem is that this class is very generic, and it should be able to satisfy both of these use cases:

  1. read a very big file, containing very big sections, and cycle through it only once. A generator of generators is perfect for this.
  2. read a smallish file into memory to be cycled over multiple times. A generator of lists works fine, because the user can just invoke

    list(MyClass(file_handle))

However, a generator of generators would NOT work in case 2, as the inner objects would not be transformed to lists.

Is there anything more elegant than implementing an explicit to_list() method, that would transform the generator of generators into a list of lists?

Python 2:

map(list, generator_of_generators)

Python 3:

list(map(list, generator_of_generators))

or for both:

[list(gen) for gen in generator_of_generators]

Since the generated objects are generator functions , not mere generators, you'd want to do

[list(gen()) for gen in generator_of_generator_functions]

If that doesn't work I have no idea what you're asking. Also, why would it return a generator function and not a generator itself?


Since in the comments you said you wanted to avoid list(generator_of_generator_functions) from crashing mysteriously, this depends on what you really want.

  • It is not possible to overwrite the behaviour of list in this way: either you store the sub-generator elements or not

  • If you really do get a crash, I recommend exhausting the sub-generator with the main generator loop every time the main generator iterates. This is standard practice and exactly what itertools.groupby does, a stdlib generator-of-generators.

eg.

def metagen():
    def innergen():
        yield 1
        yield 2
        yield 3

    for i in range(3):
        r = innergen()
        yield r

        for _ in r: pass
  • Or use a dark, secret hack method that I'll show in a mo' (I need to write it), but don't do it!

As promised, the hack (for Python 3, this time 'round):

from collections import UserList
from functools import partial


def objectitemcaller(key):
    def inner(*args, **kwargs):
        try:
            return getattr(object, key)(*args, **kwargs)
        except AttributeError:
            return NotImplemented
    return inner


class Listable(UserList):
    def __init__(self, iterator):
        self.iterator = iterator
        self.iterated = False

    def __iter__(self):
        return self

    def __next__(self):
        self.iterated = True
        return next(self.iterator)

    def _to_list_hack(self):
        self.data = list(self)
        del self.iterated
        del self.iterator
        self.__class__ = UserList

for key in UserList.__dict__.keys() - Listable.__dict__.keys():
    if key not in ["__class__", "__dict__", "__module__", "__subclasshook__"]:
        setattr(Listable, key, objectitemcaller(key))


def metagen():
    def innergen():
        yield 1
        yield 2
        yield 3

    for i in range(3):
        r = Listable(innergen())
        yield r

        if not r.iterated:
            r._to_list_hack()

        else:
            for item in r: pass

for item in metagen():
    print(item)
    print(list(item))
#>>> <Listable object at 0x7f46e4a4b850>
#>>> [1, 2, 3]
#>>> <Listable object at 0x7f46e4a4b950>
#>>> [1, 2, 3]
#>>> <Listable object at 0x7f46e4a4b990>
#>>> [1, 2, 3]

list(metagen())
#>>> [[1, 2, 3], [1, 2, 3], [1, 2, 3]]

It's so bad I don't want to even explain it.

The key is that you have a wrapper that can detect whether it has been iterated, and if not you run a _to_list_hack that, I kid you not, changes the __class__ attribute.

Because of conflicting layouts we have to use the UserList class and shadow all of its methods, which is just another layer of crud.

Basically, please don't use this hack. You can enjoy it as humour, though.

A rather pragmatic way would be to tell the "generator of generators" upon creation whether to generate generators or lists. While this is not as convenient as having list magically know what to do, it still seems to be more comfortable than having a special to_list function.

def gengen(n, listmode=False):
    for i in range(n):
        def gen():
            for k in range(i+1):
                yield k
        yield list(gen()) if listmode else gen()

Depending on the listmode parameter, this can either be used to generate generators or lists.

for gg in gengen(5, False):
    print gg, list(gg)
print list(gengen(5, True))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM