Assuming I have a generator yielding hashable values ( str
/ int
etc.) is there a way to prevent the generator from yielding the same value twice?
Obviously, I'm using a generator so I don't need to unpack all the values first so something like yield from set(some_generator)
is not an option, since that will unpack the entire generator.
Example:
# Current result
for x in my_generator():
print(x)
>>> 1
>>> 17
>>> 15
>>> 1 # <-- This shouldn't be here
>>> 15 # <-- This neither!
>>> 3
>>> ...
# Wanted result
for x in my_no_duplicate_generator():
print(x)
>>> 1
>>> 17
>>> 15
>>> 3
>>> ...
What's the most Pythonic solution for this?
You can try this:
def my_no_duplicate_generator(iterable):
seen = set()
for x in iterable:
if x not in seen:
yield x
seen.add(x)
You can use it by passing your generator as an argument:
for x in my_no_duplicate_generator(my_generator()):
print(x)
There is a unique_everseen
in Python itertools
module recipes that is roughly equivalent to @NikosOikou's answer.
The main drawback of these solutions is that they rely upon the hypothesis that elements of the iterable are hashable:
>>> L = [[1], [2,3], [1]]
>>> seen = set()
>>> for e in L: seen.add(e)
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
The more-itertools
module refines the implementation to accept unhashables elements and the doc give a tip on how to keep a good speed in some cases (disclaimer: I'm the "author" of the tip).
You can check the source code .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.