简体   繁体   中英

Confusion about iterators and iterables in Python

I am currently reading in the official documentation of Python 3.5.

It states that range() is iterable, and that list() and for are iterators. [section 4.3]

However, here it states that zip() makes an iterator.

My question is that when we use this instruction:

list(zip(list1, list2))

are we using an iterator ( list() ) to iterate through another iterator?

The documentation is creating some confusion here, by re-using the term 'iterator'.

There are three components to the iterator protocol :

  1. Iterables; things you can potentially iterate over and get their elements, one by one.

  2. Iterators; things that do the iteration. Every time you want to step through all items of an iterable, you need one of these to keep track of where you are in the process. These are not re-usable; once you reach the end, that's it. For most iterables, you can create multiple indepedent iterators, each tracking position independently.

  3. Consumers of iterators; those things that want to do something with the items.

A for loop is an example of the latter, so #3. A for loop uses the iter() function to produce an iterator (#2 above) for whatever you want to loop over, so that "whatever" must be an iterable (#1 above).

range() is an example of #1; it is iterable object. You can iterate over it multiple times, independently:

>>> r = range(5)
>>> r_iter_1 = iter(r)
>>> next(r_iter_1)
0
>>> next(r_iter_1)
1
>>> r_iter_2 = iter(r)
>>> next(r_iter_2)
0
>>> next(r_iter_1)
2

Here r_iter_1 and r_iter_2 are two separate iterators, and each time you ask for a next item they do so based on their own internal bookkeeping.

list() is an example of both an iterable (#1) and a iteration consumer (#3) . If you pass another iterable (#1) to the list() call, a list object is produced containing all elements from that iterable. But list objects themselves are also iterables.

zip() , in Python 3, takes in multiple iterables (#1), and is itself an iterator (#2). zip() stores a new iterator (#2) for each of the iterables you gave it. Each time you ask zip() for the next element, zip() builds a new tuple with the next elements from each of the contained iterables:

>>> lst1, lst2 = ['foo', 'bar'], [42, 81]
>>> zipit = zip(lst1, lst2)
>>> next(zipit)
('foo', 42)
>>> next(zipit)
('bar', 81)

So in the end, list(zip(list1, list2)) uses both list1 and list2 as iterables (#1), zip() consumes those (#3) when it itself is being consumed by the outer list() call.

The documentation is badly worded. Here's the section you're referring to:

We say such an object is iterable , that is, suitable as a target for functions and constructs that expect something from which they can obtain successive items until the supply is exhausted. We have seen that the for statement is such an iterator . The function list() is another; it creates lists from iterables:

In this paragraph, iterator does not refer to a Python iterator object, but the general idea of "something which iterates over something". In particular, the for statement cannot be an iterator object because it isn't an object at all; it's a language construct.

To answer your specific question:

... when we use this instruction:

 list(zip(list1, list2)) 

are we using an iterator ( list() ) to iterate through another iterator?

No, list() is not an iterator. It's the constructor for the list type. It can accept any iterable (including an iterator) as an argument, and uses that iterable to construct a list.

zip() is an iterator function, that is, a function which returns an iterator. In your example, the iterator it returns is passed to list() , which constructs a list object from it.

A simple way to tell whether an object is an iterator is to call next() with it, and see what happens:

>>> list1 = [1, 2, 3]
>>> list2 = [4, 5, 6]

>>> zipped = zip(list1, list2)
>>> zipped
<zip object at 0x7f27d9899688>
>>> next(zipped)
(1, 4)

In this case, the next element of zipped is returned.

>>> list3 = list(zipped)
>>> list3
[(2, 5), (3, 6)]

Notice that only the last two elements of the iterator are found in list3 , because we already consumed the first one with next() .

>>> next(list3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'list' object is not an iterator

This doesn't work, because lists are not iterators.

>>> next(zipped)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

This time, although zipped is an iterator, calling next() with it raises StopIteration because it's already been exhausted to construct list3 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM