A common pattern when working with a set is the following:
number_list = [1,5,7,2,4,4,1,3,8,5]
number_set = set()
for number in number_list:
#we only want to process the number if we haven't already processed it
if(number not in number_set):
number_set.add(number)
#do processing of 'number' here now that we know it's not a duplicate
The lines if(number not in number_set):
and number_set.add(number)
bug me because we're doing two hash lookups here, when realistically we should only need one.
Dictionaries have the "setdefault" operation, which solves a very similar problem: "If the key exists in the dictionary, return the value, otherwise insert this default and then return the default". If you do this naively, IE the following, you perform two hash lookups, but setdefault allows you to do it in one
if item_key in dict:
dict[item_key].append(item_value)
else:
dict[item_key] = [item_value]
Is there an equivalent operation for sets? Something like if(number_set.check_if_contains_and_then_add(number)):
but given a much nicer name.
No there is not.
The setdefault
method is used to set the default value of a key in dictionaries, sets don't have values so that is completely pointless.
Try this instead if the order doesn't matter.
number_list = [1,5,7,2,4,4,1,3,8,5]
number_set = set(number_list)
for number in number_set:
#do processing of 'number' here now that we know it's not a duplicate
If the profiler tells you that hash lookups contribute significant runtime, then this might work around it.
def add_value(container, value):
oldlen = len(container)
container.add(value)
return len(container) != oldlen
if add_value(number_set, number):
# process number
But why would that be? Perhaps due to a slow __hash__
method, although I can tell you now that (a) hashing integers isn't slow and (b) if you possibly can, it's better to make the class with the slow __hash__
cache the result instead of reducing the number of calls. Or perhaps due to a slow __eq__
, which is harder to deal with. Finally if the internal lookup mechanism itself is slow, then there may not be a great deal you can do to speed your program up, because the runtime is doing hash lookups all the time, finding names in scopes.
It would probably be nice for set.add
to return a value indicating whether or not the set changed, but I think that idea runs up against a principle of the Python libraries (admittedly not universally upheld) that mutating operations don't return a value unless it's fundamental to the operation to do so. So pop()
functions return a value of course, but list.sort()
returns None
even though it would occasionally be useful to users if it returned self
.
I suppose you could do something like this:
def deduped(iterable):
seen = set()
count = 0
for value in iterable:
seen.add(value)
if count != len(seen):
count += 1
yield value
for number in deduped(number_list):
# process number
Of course it's pure speculation that the repeated hash lookup is any kind of problem: I would normally write either of those functions with the if not in
test as in your original code, and the purpose of the function would be to simplify the calling code, not to avoid superfluous hash lookups.
Why wouldn't you just do number_set.add(number)
? The point of setdefault is that it won't overwrite the existing value for a key, if it exists. But a set doesn't have a value, just a key, so overwriting is irrelevant.
No there's no setdefault
type method for sets
, but you can do something like this:
number_list = [1,5,7,2,4,4,1,3,8,5]
number_set = set()
for number in number_list:
if number not in number_set and not number_set.add(number):
#do somethihng here
The not number_set.add(number)
condition will be called only if number not in number_set
is True
.
Using this you can process the unique items in ordered way(preserving the order).
>>> number_list = [1,5,7,2,4,4,1,3,8,5]
>>> seen = set()
>>> [x for x in number_list if x not in seen and not seen.add(x)]
[1, 5, 7, 2, 4, 3, 8]
If the order doesn't matter then simply call set()
on number_list
:
>>> set(number_list)
{1, 2, 3, 4, 5, 7, 8}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.