简体   繁体   中英

Creating iterators from a generator returns the same object

Let's say I have a large list of data that I want to perform some operation on, and I would like to have multiple iterators performing this operation independently.

data = [1,2,3,4,5]
generator = ((e, 2*e) for e in data)
it1 = iter(generator)
it2 = iter(generator)

I would expect these iterators to be different code objects, but it1 is it2 returns True ... More confusingly, this is true for the following generators as well:

# copied data
gen = ((e, 2*e) for e in copy.deepcopy(data))
# temp object
gen = ((e, 2*e) for e in [1,2,3,4,5])

This means in practice that when I call next(it1) , it2 is incremented as well, which is not the behavior I want.

What is going on here, and is there any way to do what I'm trying to do? I am using python 2.7 on Ubuntu 14.04.

Edit:

I just tried out the following as well:

gen = (e for e in [1,2,3,4,5])
it = iter(gen)
next(it)
next(it)
for e in gen:
    print e

Which prints 3 4 5 ... Apparently generators are just a more constrained concept that I had imagined.

Generators are iterators . All well-behaved iterators have an __iter__ method that should simply

return self

From the docs

The iterator objects themselves are required to support the following two methods, which together form the iterator protocol:

iterator.__iter__() Return the iterator object itself . This is required to allow both containers and iterators to be used with the for and in statements. This method corresponds to the tp_iter slot of the type structure for Python objects in the Python/C API.

iterator.__next__() Return the next item from the container. If there are no further items, raise the StopIteration exception. This method corresponds to the tp_iternext slot of the type structure for Python objects in the Python/C API.

So, consider another example of an iterator:

>>> x = [1, 2, 3, 4, 5]
>>> it = iter(x)
>>> it2 = iter(it)
>>> next(it)
1
>>> next(it2)
2
>>> it is it2
True

So, again, a list is iterable because it has an __iter__ method that returns an iterator . This iterator also has an __iter__ method, which should always return itself, but it also has a __next__ method.

So, consider:

>>> x = [1, 2, 3, 4, 5]
>>> it = iter(x)
>>> hasattr(x, '__iter__')
True
>>> hasattr(x, '__next__')
False
>>> hasattr(it, '__iter__')
True
>>> hasattr(it, '__next__')
True
>>> next(it)
1
>>> next(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'list' object is not an iterator

And for a generator:

>>> g = (x**2 for x in range(10))
>>> g
<generator object <genexpr> at 0x104104390>
>>> hasattr(g, '__iter__')
True
>>> hasattr(g, '__next__')
True
>>> next(g)
0

Now, you are using generator expressions . But you can just use a generator function. The most straightforward way to accomplish what you are doing is just to use:

def paired(data):
    for e in data:
        yield (e, 2*e)

Then use:

it1 = paired(data)
it2 = paired(data)

Which in this case, it1 and it2 will be two separate iterator objects.

You are using the same generator for both iters. Calling iter(thing) returns the thing's iter if it has one, so, iter(generator) returns the same thing both times you call it. https://docs.python.org/3/library/stdtypes.html#generator-types

data = [1,2,3,4,5]
generator = ((e, 2*e) for e in data)
it1 = iter(generator)
it2 = iter(generator)

type(it1)
generator

Here's two ways of getting a unique generators:

import itertools
data = [1,2,3,4,5]
generator = ((e, 2*e) for e in data)
it1, it2 = itertools.tee(generator)
type(it1)
itertools._tee

or:

data = [1,2,3,4,5]
it1 = ((e, 2*e) for e in data)
it2 = ((e, 2*e) for e in data)
type(it1)
generator

both solutions produce this:

next(it1)
(1, 2)
next(it2)
(1, 2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM