简体   繁体   中英

Is there is 'dict.setdefault' equivalent for sets?

A common pattern when working with a set is the following:

number_list = [1,5,7,2,4,4,1,3,8,5]
number_set = set()

for number in number_list:

   #we only want to process the number if we haven't already processed it
   if(number not in number_set):
       number_set.add(number)

       #do processing of 'number' here now that we know it's not a duplicate

The lines if(number not in number_set): and number_set.add(number) bug me because we're doing two hash lookups here, when realistically we should only need one.

Dictionaries have the "setdefault" operation, which solves a very similar problem: "If the key exists in the dictionary, return the value, otherwise insert this default and then return the default". If you do this naively, IE the following, you perform two hash lookups, but setdefault allows you to do it in one

if item_key in dict:
   dict[item_key].append(item_value)
else:
   dict[item_key] = [item_value]

Is there an equivalent operation for sets? Something like if(number_set.check_if_contains_and_then_add(number)): but given a much nicer name.

No there is not.

The setdefault method is used to set the default value of a key in dictionaries, sets don't have values so that is completely pointless.

Try this instead if the order doesn't matter.

number_list = [1,5,7,2,4,4,1,3,8,5]
number_set = set(number_list)

for number in number_set:
   #do processing of 'number' here now that we know it's not a duplicate

If the profiler tells you that hash lookups contribute significant runtime, then this might work around it.

def add_value(container, value):
    oldlen = len(container)
    container.add(value)
    return len(container) != oldlen

if add_value(number_set, number):
    # process number

But why would that be? Perhaps due to a slow __hash__ method, although I can tell you now that (a) hashing integers isn't slow and (b) if you possibly can, it's better to make the class with the slow __hash__ cache the result instead of reducing the number of calls. Or perhaps due to a slow __eq__ , which is harder to deal with. Finally if the internal lookup mechanism itself is slow, then there may not be a great deal you can do to speed your program up, because the runtime is doing hash lookups all the time, finding names in scopes.

It would probably be nice for set.add to return a value indicating whether or not the set changed, but I think that idea runs up against a principle of the Python libraries (admittedly not universally upheld) that mutating operations don't return a value unless it's fundamental to the operation to do so. So pop() functions return a value of course, but list.sort() returns None even though it would occasionally be useful to users if it returned self .

I suppose you could do something like this:

def deduped(iterable):
    seen = set()
    count = 0
    for value in iterable:
        seen.add(value)
        if count != len(seen):
            count += 1
            yield value

for number in deduped(number_list):
    # process number

Of course it's pure speculation that the repeated hash lookup is any kind of problem: I would normally write either of those functions with the if not in test as in your original code, and the purpose of the function would be to simplify the calling code, not to avoid superfluous hash lookups.

Why wouldn't you just do number_set.add(number) ? The point of setdefault is that it won't overwrite the existing value for a key, if it exists. But a set doesn't have a value, just a key, so overwriting is irrelevant.

No there's no setdefault type method for sets , but you can do something like this:

number_list = [1,5,7,2,4,4,1,3,8,5]
number_set = set()

for number in number_list:
   if number not in number_set and not number_set.add(number):
       #do somethihng here

The not number_set.add(number) condition will be called only if number not in number_set is True .

Using this you can process the unique items in ordered way(preserving the order).

>>> number_list = [1,5,7,2,4,4,1,3,8,5]
>>> seen = set()
>>> [x for x in number_list if x not in seen and not seen.add(x)]
[1, 5, 7, 2, 4, 3, 8]

If the order doesn't matter then simply call set() on number_list :

>>> set(number_list)
{1, 2, 3, 4, 5, 7, 8}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM