简体   繁体   中英

Preventing a generator from yielding the same object twice

Assuming I have a generator yielding hashable values ( str / int etc.) is there a way to prevent the generator from yielding the same value twice?

Obviously, I'm using a generator so I don't need to unpack all the values first so something like yield from set(some_generator) is not an option, since that will unpack the entire generator.

Example:

# Current result
for x in my_generator():
    print(x)

>>> 1
>>> 17
>>> 15
>>> 1   # <-- This shouldn't be here
>>> 15  # <-- This neither!
>>> 3
>>> ...

# Wanted result
for x in my_no_duplicate_generator():
    print(x)

>>> 1
>>> 17
>>> 15
>>> 3
>>> ...

What's the most Pythonic solution for this?

You can try this:

def my_no_duplicate_generator(iterable):
    seen = set()
    for x in iterable:
        if x not in seen:
            yield x
            seen.add(x)

You can use it by passing your generator as an argument:

for x in my_no_duplicate_generator(my_generator()):
    print(x)

There is a unique_everseen in Python itertools module recipes that is roughly equivalent to @NikosOikou's answer.

The main drawback of these solutions is that they rely upon the hypothesis that elements of the iterable are hashable:

>>> L = [[1], [2,3], [1]]
>>> seen = set()
>>> for e in L: seen.add(e)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

The more-itertools module refines the implementation to accept unhashables elements and the doc give a tip on how to keep a good speed in some cases (disclaimer: I'm the "author" of the tip).

You can check the source code .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM