简体   繁体   English

'setdefault' dict 方法的用例

[英]Use cases for the 'setdefault' dict method

The addition of collections.defaultdict in Python 2.5 greatly reduced the need for dict 's setdefault method. Python 2.5 中添加的collections.defaultdict大大减少了对dictsetdefault方法的需要。 This question is for our collective education:本题为我们集体教育:

  1. What is setdefault still useful for, today in Python 2.6/2.7?今天在 Python 2.6/2.7 中, setdefault仍然有用的是什么?
  2. What popular use cases of setdefault were superseded with collections.defaultdict ? setdefault的哪些流行用例被collections.defaultdict取代了?

You could say defaultdict is useful for settings defaults before filling the dict and setdefault is useful for setting defaults while or after filling the dict .您可以说defaultdict对于在填充 dict 之前设置默认值很有用,而setdefault对于在填充 dict 时或之后设置默认值很有用。

Probably the most common use case: Grouping items (in unsorted data, else use itertools.groupby )可能是最常见的用例:分组项目(在未排序的数据中,否则使用itertools.groupby

# really verbose
new = {}
for (key, value) in data:
    if key in new:
        new[key].append( value )
    else:
        new[key] = [value]


# easy with setdefault
new = {}
for (key, value) in data:
    group = new.setdefault(key, []) # key might exist already
    group.append( value )


# even simpler with defaultdict 
from collections import defaultdict
new = defaultdict(list)
for (key, value) in data:
    new[key].append( value ) # all keys have a default already

Sometimes you want to make sure that specific keys exist after creating a dict.有时你想确保特定的键在创建字典后存在。 defaultdict doesn't work in this case, because it only creates keys on explicit access. defaultdict在这种情况下不起作用,因为它只在显式访问时创建密钥。 Think you use something HTTP-ish with many headers -- some are optional, but you want defaults for them:假设您使用带有许多标头的类似 HTTP 的东西——有些是可选的,但您需要它们的默认值:

headers = parse_headers( msg ) # parse the message, get a dict
# now add all the optional headers
for headername, defaultvalue in optional_headers:
    headers.setdefault( headername, defaultvalue )

I commonly use setdefault for keyword argument dicts, such as in this function:我通常将setdefault用于关键字参数字典,例如在这个函数中:

def notify(self, level, *pargs, **kwargs):
    kwargs.setdefault("persist", level >= DANGER)
    self.__defcon.set(level, **kwargs)
    try:
        kwargs.setdefault("name", self.client.player_entity().name)
    except pytibia.PlayerEntityNotFound:
        pass
    return _notify(level, *pargs, **kwargs)

It's great for tweaking arguments in wrappers around functions that take keyword arguments.它非常适合在带有关键字参数的函数周围调整包装器中的参数。

defaultdict is great when the default value is static, like a new list, but not so much if it's dynamic.当默认值是静态的时, defaultdict很棒,就像一个新列表,但如果它是动态的,就不是那么好了。

For example, I need a dictionary to map strings to unique ints.例如,我需要一个字典来将字符串映射到唯一的整数。 defaultdict(int) will always use 0 for the default value. defaultdict(int)将始终使用 0 作为默认值。 Likewise, defaultdict(intGen()) always produces 1.同样, defaultdict(intGen())总是产生 1。

Instead, I used a regular dict:相反,我使用了一个普通的字典:

nextID = intGen()
myDict = {}
for lots of complicated stuff:
    #stuff that generates unpredictable, possibly already seen str
    strID = myDict.setdefault(myStr, nextID())

Note that dict.get(key, nextID()) is insufficient because I need to be able to refer to these values later as well.请注意, dict.get(key, nextID())是不够的,因为我以后还需要能够引用这些值。

intGen is a tiny class I build that automatically increments an int and returns its value: intGen是我构建的一个小类,它自动递增一个 int 并返回它的值:

class intGen:
    def __init__(self):
        self.i = 0

    def __call__(self):
        self.i += 1
    return self.i

If someone has a way to do this with defaultdict I'd love to see it.如果有人有办法用defaultdict做到这一点,我很乐意看到它。

As most answers state setdefault or defaultdict would let you set a default value when a key doesn't exist.由于大多数答案状态setdefaultdefaultdict会让您在键不存在时设置默认值。 However, I would like to point out a small caveat with regard to the use cases of setdefault .但是,我想就setdefault的用例指出一个小警告。 When the Python interpreter executes setdefault it will always evaluate the second argument to the function even if the key exists in the dictionary.当 Python 解释器执行setdefault时,它总是会评估函数的第二个参数,即使键存在于字典中也是如此。 For example:例如:

In: d = {1:5, 2:6}

In: d
Out: {1: 5, 2: 6}

In: d.setdefault(2, 0)
Out: 6

In: d.setdefault(2, print('test'))
test
Out: 6

As you can see, print was also executed even though 2 already existed in the dictionary.如您所见,即使字典中已经存在 2,也会执行print This becomes particularly important if you are planning to use setdefault for example for an optimization like memoization .如果您计划使用setdefault进行例如memoization类的优化,这一点就变得尤为重要。 If you add a recursive function call as the second argument to setdefault , you wouldn't get any performance out of it as Python would always be calling the function recursively.如果你添加一个递归函数调用作为setdefault的第二个参数,你不会从中获得任何性能,因为 Python 总是递归地调用函数。

Since memoization was mentioned, a better alternative is to use functools.lru_cache decorator if you consider enhancing a function with memoization.由于提到了记忆,如果您考虑使用记忆增强功能,更好的选择是使用 functools.lru_cache 装饰器。 lru_cache handles the caching requirements for a recursive function better. lru_cache 更好地处理递归函数的缓存要求。

I use setdefault() when I want a default value in an OrderedDict .当我想要OrderedDict中的默认值时,我使用setdefault() There isn't a standard Python collection that does both, but there are ways to implement such a collection.没有一个标准的 Python 集合可以同时执行这两种操作,但是有一些方法可以实现这样的集合。

As Muhammad said, there are situations in which you only sometimes wish to set a default value.正如 Muhammad 所说,有些情况下您只是有时希望设置默认值。 A great example of this is a data structure which is first populated, then queried.一个很好的例子是首先填充,然后查询的数据结构。

Consider a trie.考虑一个尝试。 When adding a word, if a subnode is needed but not present, it must be created to extend the trie.添加单词时,如果需要但不存在子节点,则必须创建它以扩展 trie。 When querying for the presence of a word, a missing subnode indicates that the word is not present and it should not be created.当查询一个词是否存在时,缺少子节点表示该词不存在,不应创建。

A defaultdict cannot do this. defaultdict 不能这样做。 Instead, a regular dict with the get and setdefault methods must be used.相反,必须使用带有 get 和 setdefault 方法的常规字典。

Theoretically speaking, setdefault would still be handy if you sometimes want to set a default and sometimes not.从理论上讲,如果您有时想设置默认值而有时不想设置默认值, setdefault仍然很方便。 In real life, I haven't come across such a use case.在现实生活中,我还没有遇到过这样的用例。

However, an interesting use case comes up from the standard library (Python 2.6, _threadinglocal.py):然而,一个有趣的用例来自标准库(Python 2.6,_threadinglocal.py):

>>> mydata = local()
>>> mydata.__dict__
{'number': 42}
>>> mydata.__dict__.setdefault('widgets', [])
[]
>>> mydata.widgets
[]

I would say that using __dict__.setdefault is a pretty useful case.我会说使用__dict__.setdefault是一个非常有用的案例。

Edit : As it happens, this is the only example in the standard library and it is in a comment.编辑:碰巧的是,这是标准库中唯一的例子,它在评论中。 So may be it is not enough of a case to justify the existence of setdefault .因此,证明setdefault存在的理由可能还不够。 Still, here is an explanation:不过,这里有一个解释:

Objects store their attributes in the __dict__ attribute.对象将它们的属性存储在__dict__属性中。 As it happens, the __dict__ attribute is writeable at any time after the object creation.碰巧的是, __dict__属性在对象创建后的任何时候都是可写的。 It is also a dictionary not a defaultdict .它也是一本字典而不是defaultdict It is not sensible for objects in the general case to have __dict__ as a defaultdict because that would make each object having all legal identifiers as attributes.在一般情况下,对象将__dict__作为defaultdict是不明智的,因为这会使每个对象都具有所有合法标识符作为属性。 So I can't foresee any change to Python objects getting rid of __dict__.setdefault , apart from deleting it altogether if it was deemed not useful.所以我无法预见对 Python 对象的任何更改都会摆脱__dict__.setdefault ,除了如果它被认为没有用的话将其完全删除。

One drawback of defaultdict over dict ( dict.setdefault ) is that a defaultdict object creates a new item EVERYTIME non existing key is given (eg with == , print ). defaultdict相对于dict ( dict.setdefault ) 的一个缺点是defaultdict对象会在每次给出不存在的键时创建一个新项(例如==print )。 Also the defaultdict class is generally way less common then the dict class, its more difficult to serialize it IME.此外, defaultdict类通常不如dict类常见,它更难序列化 IME。

PS IMO functions|methods not meant to mutate an object, should not mutate an object. PS IMO functions|methods 并不意味着改变一个对象,不应该改变一个对象。

I rewrote the accepted answer and facile it for the newbies.我重写了接受的答案并为新手提供便利。

#break it down and understand it intuitively.
new = {}
for (key, value) in data:
    if key not in new:
        new[key] = [] # this is core of setdefault equals to new.setdefault(key, [])
        new[key].append(value)
    else:
        new[key].append(value)


# easy with setdefault
new = {}
for (key, value) in data:
    group = new.setdefault(key, []) # it is new[key] = []
    group.append(value)



# even simpler with defaultdict
new = defaultdict(list)
for (key, value) in data:
    new[key].append(value) # all keys have a default value of empty list []

Additionally,I categorized the methods as reference:另外,我将这些方法归类为参考:

dict_methods_11 = {
            'views':['keys', 'values', 'items'],
            'add':['update','setdefault'],
            'remove':['pop', 'popitem','clear'],
            'retrieve':['get',],
            'copy':['copy','fromkeys'],}

Here are some examples of setdefault to show its usefulness:下面是一些 setdefault 的例子来展示它的用处:

"""
d = {}
# To add a key->value pair, do the following:
d.setdefault(key, []).append(value)

# To retrieve a list of the values for a key
list_of_values = d[key]

# To remove a key->value pair is still easy, if
# you don't mind leaving empty lists behind when
# the last value for a given key is removed:
d[key].remove(value)

# Despite the empty lists, it's still possible to 
# test for the existance of values easily:
if d.has_key(key) and d[key]:
    pass # d has some values for key

# Note: Each value can exist multiple times!
"""
e = {}
print e
e.setdefault('Cars', []).append('Toyota')
print e
e.setdefault('Motorcycles', []).append('Yamaha')
print e
e.setdefault('Airplanes', []).append('Boeing')
print e
e.setdefault('Cars', []).append('Honda')
print e
e.setdefault('Cars', []).append('BMW')
print e
e.setdefault('Cars', []).append('Toyota')
print e

# NOTE: now e['Cars'] == ['Toyota', 'Honda', 'BMW', 'Toyota']
e['Cars'].remove('Toyota')
print e
# NOTE: it's still true that ('Toyota' in e['Cars'])

I use setdefault frequently when, get this, setting a default (;.:) in a dictionary;我经常使用 setdefault ,得到这个,在字典中设置默认值 (;.:) ; somewhat commonly the os.environ dictionary: os.environ 字典有点常见:

# Set the venv dir if it isn't already overridden:
os.environ.setdefault('VENV_DIR', '/my/default/path')

Less succinctly, this looks like this:不太简洁,这看起来像这样:

# Set the venv dir if it isn't already overridden:
if 'VENV_DIR' not in os.environ:
    os.environ['VENV_DIR'] = '/my/default/path')

It's worth noting that you can also use the resulting variable:值得注意的是,您还可以使用结果变量:

venv_dir = os.environ.setdefault('VENV_DIR', '/my/default/path')

But that's less necessary than it was before defaultdicts existed.但这比 defaultdicts 存在之前没有必要。

Another use case that I don't think was mentioned above.我认为上面没有提到的另一个用例。 Sometimes you keep a cache dict of objects by their id where primary instance is in the cache and you want to set cache when missing.有时,您通过对象的 id 保留对象的缓存字典,其中主实例在缓存中,并且您希望在丢失时设置缓存。

return self.objects_by_id.setdefault(obj.id, obj)

That's useful when you always want to keep a single instance per distinct id no matter how you obtain an obj each time.当您总是希望为每个不同的 id 保留一个实例时,无论您每次如何获取 obj,这都非常有用。 For example when object attributes get updated in memory and saving to storage is deferred.例如,当对象属性在内存中更新并延迟保存到存储中时。

One very important use-case I just stumbled across: dict.setdefault() is great for multi-threaded code when you only want a single canonical object (as opposed to multiple objects that happen to be equal).我偶然发现了一个非常重要的用例: dict.setdefault()非常适合多线程代码,当您只需要一个规范对象(而不是碰巧相等的多个对象)时。

For example, the (Int)Flag Enum in Python 3.6.0 has a bug : if multiple threads are competing for a composite (Int)Flag member, there may end up being more than one:例如, Python 3.6.0 中的(Int)Flag Enum 有一个 bug :如果多个线程竞争一个复合的(Int)Flag成员,最终可能会有多个:

from enum import IntFlag, auto
import threading

class TestFlag(IntFlag):
    one = auto()
    two = auto()
    three = auto()
    four = auto()
    five = auto()
    six = auto()
    seven = auto()
    eight = auto()

    def __eq__(self, other):
        return self is other

    def __hash__(self):
        return hash(self.value)

seen = set()

class cycle_enum(threading.Thread):
    def run(self):
        for i in range(256):
            seen.add(TestFlag(i))

threads = []
for i in range(8):
    threads.append(cycle_enum())

for t in threads:
    t.start()

for t in threads:
    t.join()

len(seen)
# 272  (should be 256)

The solution is to use setdefault() as the last step of saving the computed composite member -- if another has already been saved then it is used instead of the new one, guaranteeing unique Enum members.解决方案是使用setdefault()作为保存计算的复合成员的最后一步——如果另一个已经保存,则使用它而不是新的,保证唯一的 Enum 成员。

In addition to what have been suggested, setdefault might be useful in situations where you don't want to modify a value that has been already set.除了已建议的内容之外, setdefault在您不想修改已设置的值的情况下可能很有用。 For example, when you have duplicate numbers and you want to treat them as one group.例如,当您有重复的数字并且希望将它们视为一组时。 In this case, if you encounter a repeated duplicate key which has been already set, you won't update the value of that key.在这种情况下,如果您遇到重复的已设置的duplicate键,您将不会更新该键的值。 You will keep the first encountered value.您将保留第一个遇到的值。 As if you are iterating/updating the repeated keys once only.就好像您只迭代/更新重复的键一次。

Here's a code example of recording the index for the keys/elements of a sorted list:下面是记录排序列表的键/元素索引的代码示例:

nums = [2,2,2,2,2]
d = {}
for idx, num in enumerate(sorted(nums)):
    # This will be updated with the value/index of the of the last repeated key
    # d[num] = idx # Result (sorted_indices): [4, 4, 4, 4, 4]
    # In the case of setdefault, all encountered repeated keys won't update the key.
    # However, only the first encountered key's index will be set 
    d.setdefault(num,idx) # Result (sorted_indices): [0, 0, 0, 0, 0]

sorted_indices = [d[i] for i in nums]

[Edit] Very wrong! [编辑]大错特错! The setdefault would always trigger long_computation, Python being eager. setdefault 总是会触发 long_computation,Python 很急切。

Expanding on Tuttle's answer.扩展塔特尔的答案。 For me the best use case is cache mechanism.对我来说最好的用例是缓存机制。 Instead of:代替:

if x not in memo:
   memo[x]=long_computation(x)
return memo[x]

which consumes 3 lines and 2 or 3 lookups, I would happily write :它消耗 3 行和 2 或 3 次查找, 我会很高兴地写

return memo.setdefault(x, long_computation(x))

I like the answer given here:我喜欢这里给出的答案:

http://stupidpythonideas.blogspot.com/2013/08/defaultdict-vs-setdefault.html http://stupidpythonideas.blogspot.com/2013/08/defaultdict-vs-setdefault.html

In short, the decision (in non-performance-critical apps) should be made on the basis of how you want to handle lookup of empty keys downstream ( viz. KeyError versus default value).简而言之,(在非性能关键型应用程序中)应该根据您希望如何处理下游空键的查找来做出决定(KeyError与默认值)。

The different use case for setdefault() is when you don't want to overwrite the value of an already set key. setdefault()的不同用例是当您不想覆盖已设置键的值时。 defaultdict overwrites, while setdefault() does not. defaultdict会覆盖,而setdefault()不会。 For nested dictionaries it is more often the case that you want to set a default only if the key is not set yet, because you don't want to remove the present sub dictionary.对于嵌套字典,更常见的情况是您只想在键尚未设置时设置默认值,因为您不想删除当前的子字典。 This is when you use setdefault() .这是您使用setdefault()的时候。

Example with defaultdict : defaultdict示例:

>>> from collection import defaultdict()
>>> foo = defaultdict()
>>> foo['a'] = 4
>>> foo['a'] = 2
>>> print(foo)
defaultdict(None, {'a': 2})

setdefault doesn't overwrite: setdefault不会覆盖:

>>> bar = dict()
>>> bar.setdefault('a', 4)
>>> bar.setdefault('a', 2)
>>> print(bar)
{'a': 4}

Another usecase for setdefault in CPython is that it is atomic in all cases, whereas defaultdict will not be atomic if you use a default value created from a lambda. CPython 中setdefault的另一个用例是它在所有情况下都是原子的,而如果您使用从 lambda 创建的默认值,则defaultdict将不是原子的。

cache = {}

def get_user_roles(user_id):
    if user_id in cache:
        return cache[user_id]['roles']

    cache.setdefault(user_id, {'lock': threading.Lock()})

    with cache[user_id]['lock']:
        roles = query_roles_from_database(user_id)
        cache[user_id]['roles'] = roles

If two threads execute cache.setdefault at the same time, only one of them will be able to create the default value.如果两个线程同时执行cache.setdefault ,则只有其中一个能够创建默认值。

If instead you used a defaultdict:相反,如果您使用了 defaultdict:

cache = defaultdict(lambda: {'lock': threading.Lock()}

This would result in a race condition.这将导致竞争条件。 In my example above, the first thread could create a default lock, and the second thread could create another default lock, and then each thread could lock its own default lock, instead of the desired outcome of each thread attempting to lock a single lock.在我上面的示例中,第一个线程可以创建一个默认锁,第二个线程可以创建另一个默认锁,然后每个线程可以锁定自己的默认锁,而不是每个线程尝试锁定单个锁的预期结果。


Conceptually, setdefault basically behaves like this (defaultdict also behaves like this if you use an empty list, empty dict, int, or other default value that is not user python code like a lambda):从概念上讲, setdefault基本上是这样的(如果您使用空列表、空字典、int 或其他不是用户 python 代码(如 lambda)的默认值,defaultdict 也会像这样):

gil = threading.Lock()

def setdefault(dict, key, value_func):
    with gil:
        if key not in dict:
            return
       
        value = value_func()

        dict[key] = value

Conceptually, defaultdict basically behaves like this (only when using python code like a lambda - this is not true if you use an empty list):从概念上讲, defaultdict基本上是这样的(仅当使用像 lambda 这样的 python 代码时——如果您使用空列表,则不是这样):

gil = threading.Lock()

def __setitem__(dict, key, value_func):
    with gil:
        if key not in dict:
            return

    value = value_func()

    with gil:
        dict[key] = value

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM