简体   繁体   English

python dict:get vs setdefault

[英]python dict: get vs setdefault

The following two expressions seem equivalent to me.以下两个表达对我来说似乎是等价的。 Which one is preferable?哪个更可取?

data = [('a', 1), ('b', 1), ('b', 2)]

d1 = {}
d2 = {}

for key, val in data:
    # variant 1)
    d1[key] = d1.get(key, []) + [val]
    # variant 2)
    d2.setdefault(key, []).append(val)

The results are the same but which version is better or rather more pythonic?结果是一样的,但哪个版本更好或更像是 Pythonic?

Personally I find version 2 harder to understand, as to me setdefault is very tricky to grasp.我个人觉得第 2 版更难理解,对我来说 setdefault 很难掌握。 If I understand correctly, it looks for the value of "key" in the dictionary, if not available, enters "[]" into the dict, returns a reference to either the value or "[]" and appends "val" to that reference.如果我理解正确,它会在字典中查找“key”的值,如果不可用,则在字典中输入“[]”,返回对该值或“[]”的引用,并将“val”附加到该值参考。 While certainly smooth it is not intuitive in the least (at least to me).虽然肯定很流畅,但它至少不直观(至少对我而言)。

To my mind, version 1 is easier to understand (if available, get the value for "key", if not, get "[]", then join with a list made up from [val] and place the result in "key").在我看来,版本 1 更容易理解(如果可用,获取“key”的值,如果没有,获取“[]”,然后加入由 [val] 组成的列表并将结果放入“key” )。 But while more intuitive to understand, I fear this version is less performant, with all this list creating.但是虽然更直观地理解,但我担心这个版本的性能较差,所有这些列表都在创建。 Another disadvantage is that "d1" occurs twice in the expression which is rather error-prone.另一个缺点是“d1”在表达式中出现两次,很容易出错。 Probably there is a better implementation using get, but presently it eludes me.使用 get 可能有更好的实现,但目前我无法理解。

My guess is that version 2, although more difficult to grasp for the inexperienced, is faster and therefore preferable.我的猜测是第 2 版虽然对于没有经验的人来说更难掌握,但速度更快,因此更可取。 Opinions?意见?

Your two examples do the same thing, but that doesn't mean get and setdefault do.你的两个例子做同样的事情,但这并不意味着getsetdefault做。

The difference between the two is basically manually setting d[key] to point to the list every time, versus setdefault automatically setting d[key] to the list only when it's unset.两者之间的区别基本上是每次手动设置d[key]指向列表,而setdefault仅在未设置时自动将d[key]设置为列表。

Making the two methods as similar as possible, I ran使这两种方法尽可能相似,我跑了

from timeit import timeit

print timeit("c = d.get(0, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000)
print timeit("c = d.get(1, []); c.extend([1]); d[0] = c", "d = {1: []}", number = 1000000)
print timeit("d.setdefault(0, []).extend([1])", "d = {1: []}", number = 1000000)
print timeit("d.setdefault(1, []).extend([1])", "d = {1: []}", number = 1000000)

and got并得到

0.794723378711
0.811882272256
0.724429205999
0.722129751973

So setdefault is around 10% faster than get for this purpose.因此,为此目的, setdefaultget快 10% 左右。

The get method allows you to do less than you can with setdefault .get方法可以让你比你可以用setdefault You can use it to avoid getting a KeyError when the key doesn't exist (if that's something that's going to happen frequently) even if you don't want to set the key.即使您不想设置密钥,您也可以使用它来避免在密钥不存在时出现KeyError (如果这是经常发生的事情)。

See Use cases for the 'setdefault' dict method and dict.get() method returns a pointer for some more info about the two methods.有关这两种方法的更多信息,请参阅“setdefault”dict 方法dict.get() 方法返回一个指针的用例

The thread about setdefault concludes that most of the time, you want to use a defaultdict .关于setdefault的线程得出的结论是,大多数情况下,您希望使用defaultdict The thread about get concludes that it is slow, and often you're better off (speed wise) doing a double lookup, using a defaultdict, or handling the error (depending on the size of the dictionary and your use case).关于get的线程得出结论,它很慢,而且通常最好(速度明智)进行双重查找、使用 defaultdict 或处理错误(取决于字典的大小和您的用例)。

The accepted answer from agf isn't comparing like with like.来自 agf 的公认答案不是与同类比较。 After:后:

print timeit("d[0] = d.get(0, []) + [1]", "d = {1: []}", number = 10000)

d[0] contains a list with 10,000 items whereas after: d[0]包含一个包含 10,000 个项目的列表,而之后:

print timeit("d.setdefault(0, []) + [1]", "d = {1: []}", number = 10000)

d[0] is simply [] . d[0]就是[] ie the d.setdefault version never modifies the list stored in d .d.setdefault版本从不修改存储在d的列表。 The code should actually be:代码实际上应该是:

print timeit("d.setdefault(0, []).append(1)", "d = {1: []}", number = 10000)

and in fact is faster than the faulty setdefault example.并且实际上比错误的setdefault示例更快。

The difference here really is because of when you append using concatenation the whole list is copied every time (and once you have 10,000 elements that is beginning to become measurable. Using append the list updates are amortised O(1), ie effectively constant time.这里的区别实际上是因为当您使用连接附加时,每次都会复制整个列表(并且一旦您有 10,000 个开始变得可测量的元素。使用append列表更新分摊 O(1),即有效地恒定时间。

Finally, there are two other options not considered in the original question: defaultdict or simply testing the dictionary to see whether it already contains the key.最后,原始问题中没有考虑另外两个选项: defaultdict或简单地测试字典以查看它是否已经包含键。

So, assuming d3, d4 = defaultdict(list), {}所以,假设d3, d4 = defaultdict(list), {}

# variant 1 (0.39)
d1[key] = d1.get(key, []) + [val]
# variant 2 (0.003)
d2.setdefault(key, []).append(val)
# variant 3 (0.0017)
d3[key].append(val)
# variant 4 (0.002)
if key in d4:
    d4[key].append(val)
else:
    d4[key] = [val]

variant 1 is by far the slowest because it copies the list every time, variant 2 is the second slowest, variant 3 is the fastest but won't work if you need Python older than 2.5, and variant 4 is just slightly slower than variant 3.变体 1 是迄今为止最慢的,因为它每次都复制列表,变体 2 是第二慢的,变体 3 是最快的,但如果您需要 2.5 以上的 Python 则无法工作,变体 4 仅比变体 3 慢一点.

I would say use variant 3 if you can, with variant 4 as an option for those occasional places where defaultdict isn't an exact fit.如果可以,我会说使用变体 3,将变体 4 作为选项用于那些defaultdict不完全合适的偶尔地方。 Avoid both of your original variants.避免使用两种原始变体。

You might want to look at defaultdict in the collections module.您可能想查看collections模块中的defaultdict The following is equivalent to your examples.以下相当于您的示例。

from collections import defaultdict

data = [('a', 1), ('b', 1), ('b', 2)]

d = defaultdict(list)

for k, v in data:
    d[k].append(v)

There's more here .还有更多的在这里

For those who are still struggling in understanding these two term, let me tell you basic difference between get() and setdefault() method -对于那些仍在努力理解这两个术语的人,让我告诉您 get() 和 setdefault() 方法之间的基本区别 -

Scenario-1场景一

root = {}
root.setdefault('A', [])
print(root)

Scenario-2场景 2

root = {}
root.get('A', [])
print(root)

In Scenario-1 output will be {'A': []} while in Scenario-2 {}在场景 1 中输出将是{'A': []}而在场景 2 {}

So setdefault() sets absent keys in the dict while get() only provides you default value but it does not modify the dictionary.所以setdefault()在 dict 中设置不存在的键,而get()只为您提供默认值但它不会修改字典。

Now let come where this will be useful- Suppose you are searching an element in a dict whose value is a list and you want to modify that list if found otherwise create a new key with that list.现在让我们来看看这将是有用的 - 假设您正在字典中搜索一个元素,其值为一个列表,并且您想要修改该列表,如果找到,则使用该列表创建一个新键。

using setdefault()使用setdefault()

def fn1(dic, key, lst):
    dic.setdefault(key, []).extend(lst)

using get()使用get()

def fn2(dic, key, lst):
    dic[key] = dic.get(key, []) + (lst) #Explicit assigning happening here

Now lets examine timings -现在让我们检查时间 -

dic = {}
%%timeit -n 10000 -r 4
fn1(dic, 'A', [1,2,3])

Took 288 ns耗时 288 纳秒

dic = {}
%%timeit -n 10000 -r 4
fn2(dic, 'A', [1,2,3])

Took 128 s花了 128 秒

So there is a very large timing difference between these two approaches.因此,这两种方法之间存在非常大的时序差异。

1. Explained with a good example here: 1.这里用一个很好的例子来解释:
http://code.activestate.com/recipes/66516-add-an-entry-to-a-dictionary-unless-the-entry-is-a/ http://code.activestate.com/recipes/66516-add-an-entry-to-a-dictionary-unless-the-entry-is-a/

dict.字典。 setdefault typical usage setdefault典型用法
somedict.setdefault(somekey,[]).append(somevalue)

dict.字典。 get typical usage获取典型用法
theIndex[word] = 1 + theIndex.get(word,0)


2. More explanation : http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html 2. 更多解释: http : //python.net/~goodger/projects/pycon/2007/idiomatic/handout.html

dict.setdefault() is equivalent to get or set & get . dict.setdefault()等价于getset & get Or set if necessary then get .或者set if necessary then get It's especially efficient if your dictionary key is expensive to compute or long to type.如果您的字典键计算成本高或键入时间长,则特别有效。

The only problem with dict.setdefault() is that the default value is always evaluated, whether needed or not. dict.setdefault() 的唯一问题是默认值总是被评估,无论是否需要。 That only matters if the default value is expensive to compute .仅当默认值计算成本高时才重要 In that case, use defaultdict.在这种情况下,请使用 defaultdict。


3. Finally the official docs with difference highlighted http://docs.python.org/2/library/stdtypes.html 3. 最后突出显示不同的官方文档http://docs.python.org/2/library/stdtypes.html

get(key[, default])
Return the value for key if key is in the dictionary, else default.如果键在字典中,则返回键的值,否则返回默认值。 If default is not given, it defaults to None, so that this method never raises a KeyError.如果未给出默认值,则默认为 None,因此此方法永远不会引发 KeyError。

setdefault(key[, default])
If key is in the dictionary, return its value.如果键在字典中,则返回其值。 If not, insert key with a value of default and return default.如果没有,插入值为 default 的并返回默认值。 default defaults to None.默认默认为无。

The logic of dict.get is: dict.get的逻辑是:

if key in a_dict:
    value = a_dict[key] 
else: 
    value = default_value

Take an example:举个例子:

In [72]: a_dict = {'mapping':['dict', 'OrderedDict'], 'array':['list', 'tuple']}
In [73]: a_dict.get('string', ['str', 'bytes'])
Out[73]: ['str', 'bytes']
In [74]: a_dict.get('array', ['str', 'byets'])
Out[74]: ['list', 'tuple']

The mechamism of setdefault is: setdefault的机制是:

    levels = ['master', 'manager', 'salesman', 'accountant', 'assistant']
    #group them by the leading letter
    group_by_leading_letter = {}
    # the logic expressed by obvious if condition
    for level in levels:
        leading_letter = level[0]
        if leading_letter not in group_by_leading_letter:
            group_by_leading_letter[leading_letter] = [level]
        else:
            group_by_leading_letter[leading_letter].append(word)
    In [80]: group_by_leading_letter
    Out[80]: {'a': ['accountant', 'assistant'], 'm': ['master', 'manager'], 's': ['salesman']}

The setdefault dict method is for precisely this purpose. setdefault dict 方法正是为此目的。 The preceding for loop can be rewritten as:前面的 for 循环可以改写为:

In [87]: for level in levels:
    ...:     leading = level[0]
    ...:     group_by_leading_letter.setdefault(leading,[]).append(level)
Out[80]: {'a': ['accountant', 'assistant'], 'm': ['master', 'manager'], 's': ['salesman']}

It's very simple, means that either a non-null list append an element or a null list append an element.这很简单,意味着非空列表追加元素或空列表追加元素。

The defaultdict , which makes this even easier. defaultdict ,这使得这更容易。 To create one, you pass a type or function for generating the default value for each slot in the dict:要创建一个,你传递一个类型或函数来为字典中的每个插槽生成默认值:

from collections import defualtdict
group_by_leading_letter = defaultdict(list)
for level in levels:
    group_by_leading_letter[level[0]].append(level)
In [1]: person_dict = {}

In [2]: person_dict['liqi'] = 'LiQi'

In [3]: person_dict.setdefault('liqi', 'Liqi')
Out[3]: 'LiQi'

In [4]: person_dict.setdefault('Kim', 'kim')
Out[4]: 'kim'

In [5]: person_dict
Out[5]: {'Kim': 'kim', 'liqi': 'LiQi'}

In [8]: person_dict.get('Dim', '')
Out[8]: ''

In [5]: person_dict
Out[5]: {'Kim': 'kim', 'liqi': 'LiQi'}

There is no strict answer to this question.这个问题没有严格的答案。 They both accomplish the same purpose.它们都实现了相同的目的。 They can both be used to deal with missing values on keys.它们都可以用来处理键上的缺失值。 The only difference that I have found is that with setdefault(), the key that you invoke (if not previously in the dictionary) gets automatically inserted while it does not happen with get().我发现的唯一区别是,使用 setdefault() 时,您调用的键(如果之前不在字典中)会自动插入,而 get() 不会发生这种情况。 Here is an example: Setdefault()这是一个例子: Setdefault()

>>> myDict = {'A': 'GOD', 'B':'Is', 'C':'GOOD'} #(1)
>>> myDict.setdefault('C')  #(2)
'GOOD'
>>> myDict.setdefault('C','GREAT')  #(3)
'GOOD'
>>> myDict.setdefault('D','AWESOME') #(4)
'AWESOME'
>>> myDict #(5)
{'A': 'GOD', 'B': 'Is', 'C': 'GOOD', 'D': 'AWSOME'} 
>>> myDict.setdefault('E')
>>>

Get()得到()

>>> myDict = {'a': 1, 'b': 2, 'c': 3}   #(1)
>>> myDict.get('a',0)   #(2)
1
>>> myDict.get('d',0)   #(3)
0
>>> myDict #(4)
{'a': 1, 'b': 2, 'c': 3}

Here is my conclusion: there is no specific answer to which one is best specifically when it comes to default values imputation.这是我的结论:在默认值插补方面,没有具体的答案。 The only difference is that setdefault() automatically adds any new key with a default value in the dictionary while get() does not.唯一的区别是 setdefault() 会自动在字典中添加任何具有默认值的新键,而 get() 不会。 For more information, please go here !欲了解更多信息,请到这里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM