简体   繁体   English

集是否有“ dict.setdefault”等效项?

[英]Is there is 'dict.setdefault' equivalent for sets?

A common pattern when working with a set is the following: 使用集合时的常见模式如下:

number_list = [1,5,7,2,4,4,1,3,8,5]
number_set = set()

for number in number_list:

   #we only want to process the number if we haven't already processed it
   if(number not in number_set):
       number_set.add(number)

       #do processing of 'number' here now that we know it's not a duplicate

The lines if(number not in number_set): and number_set.add(number) bug me because we're doing two hash lookups here, when realistically we should only need one. if(number not in number_set):number_set.add(number)我的number_set.add(number)因为我们在这里进行了两次哈希查找,实际上,我们只需要一个哈希查找。

Dictionaries have the "setdefault" operation, which solves a very similar problem: "If the key exists in the dictionary, return the value, otherwise insert this default and then return the default". 字典具有“ setdefault”操作,它解决了一个非常类似的问题:“如果字典中存在键,则返回值,否则插入该默认值,然后返回默认值”。 If you do this naively, IE the following, you perform two hash lookups, but setdefault allows you to do it in one 如果您天真地执行此操作(例如IE以下),则将执行两次哈希查找,但是setdefault允许您一次执行一次

if item_key in dict:
   dict[item_key].append(item_value)
else:
   dict[item_key] = [item_value]

Is there an equivalent operation for sets? 集有等效的操作吗? Something like if(number_set.check_if_contains_and_then_add(number)): but given a much nicer name. if(number_set.check_if_contains_and_then_add(number)):但是给了一个更好的名字。

No there is not. 不,那里没有。

The setdefault method is used to set the default value of a key in dictionaries, sets don't have values so that is completely pointless. setdefault方法用于在字典中设置键的默认 ,而set没有值,因此完全没有意义。

Try this instead if the order doesn't matter. 如果顺序无关紧要,请尝试此方法。

number_list = [1,5,7,2,4,4,1,3,8,5]
number_set = set(number_list)

for number in number_set:
   #do processing of 'number' here now that we know it's not a duplicate

If the profiler tells you that hash lookups contribute significant runtime, then this might work around it. 如果探查器告诉您哈希查找对运行时有重要作用,则可能可以解决此问题。

def add_value(container, value):
    oldlen = len(container)
    container.add(value)
    return len(container) != oldlen

if add_value(number_set, number):
    # process number

But why would that be? 但是为什么会这样呢? Perhaps due to a slow __hash__ method, although I can tell you now that (a) hashing integers isn't slow and (b) if you possibly can, it's better to make the class with the slow __hash__ cache the result instead of reducing the number of calls. 也许是由于__hash__方法很慢,尽管我现在可以告诉您(a)哈希整数并不慢,并且(b)如果可能的话,最好让__hash__慢的类缓存结果而不是减少通话次数。 Or perhaps due to a slow __eq__ , which is harder to deal with. 也许是由于__eq__较慢,这很难处理。 Finally if the internal lookup mechanism itself is slow, then there may not be a great deal you can do to speed your program up, because the runtime is doing hash lookups all the time, finding names in scopes. 最后,如果内部查找机制本身很慢,那么您可能无法做很多事情来加快程序的速度,因为运行时一直在进行哈希查找,在作用域中查找名称。

It would probably be nice for set.add to return a value indicating whether or not the set changed, but I think that idea runs up against a principle of the Python libraries (admittedly not universally upheld) that mutating operations don't return a value unless it's fundamental to the operation to do so. set.add返回一个值,该值指示集合是否更改可能会很好,但是我认为这种想法违背了Python库的原理(公认地,不是普遍支持的),即变异操作不会返回值除非这样做对操作至关重要。 So pop() functions return a value of course, but list.sort() returns None even though it would occasionally be useful to users if it returned self . 因此pop()函数当然会返回一个值,但list.sort()返回None即使返回self偶尔对用户有用。

I suppose you could do something like this: 我想你可以做这样的事情:

def deduped(iterable):
    seen = set()
    count = 0
    for value in iterable:
        seen.add(value)
        if count != len(seen):
            count += 1
            yield value

for number in deduped(number_list):
    # process number

Of course it's pure speculation that the repeated hash lookup is any kind of problem: I would normally write either of those functions with the if not in test as in your original code, and the purpose of the function would be to simplify the calling code, not to avoid superfluous hash lookups. 当然,纯粹是推测重复的哈希查找是什么问题:我通常会使用原始代码中if not in测试的方式编写这些函数中的任何一个,而该函数的目的是简化调用代码,不避免多余的哈希查找。

Why wouldn't you just do number_set.add(number) ? 你为什么不只做number_set.add(number) The point of setdefault is that it won't overwrite the existing value for a key, if it exists. setdefault的要点是,它不会覆盖键的现有值(如果存在)。 But a set doesn't have a value, just a key, so overwriting is irrelevant. 但是集合没有值,只有键,因此覆盖无关紧要。

No there's no setdefault type method for sets , but you can do something like this: 不, sets没有setdefault类型方法,但是您可以执行以下操作:

number_list = [1,5,7,2,4,4,1,3,8,5]
number_set = set()

for number in number_list:
   if number not in number_set and not number_set.add(number):
       #do somethihng here

The not number_set.add(number) condition will be called only if number not in number_set is True . 仅当number not in number_setTrue才会调用not number_set.add(number)条件。

Using this you can process the unique items in ordered way(preserving the order). 使用此功能,您可以按有序方式处理唯一项目(保留订单)。

>>> number_list = [1,5,7,2,4,4,1,3,8,5]
>>> seen = set()
>>> [x for x in number_list if x not in seen and not seen.add(x)]
[1, 5, 7, 2, 4, 3, 8]

If the order doesn't matter then simply call set() on number_list : 如果顺序无关紧要,则只需在number_list上调用set()

>>> set(number_list)
{1, 2, 3, 4, 5, 7, 8}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM