[英]setdefault vs defaultdict performance
I am writing code for an application where performance is important.我正在为性能很重要的应用程序编写代码。 I am wondering why defaultdict
seems to be faster then setdefault
.我想知道为什么defaultdict
似乎比setdefault
更快。
I would like to be able to use setdefault
, mostly because i do not like the print output of the nested defaultdict
(see implementation below).我希望能够使用setdefault
,主要是因为我不喜欢嵌套defaultdict
的打印 output (参见下面的实现)。
In my code, i need to test if element_id
is already a key of the dict.在我的代码中,我需要测试element_id
是否已经是字典的键。
Here are the two functions that i am testing:这是我正在测试的两个功能:
def defaultdictfunc(subcases,other_ids,element_ids):
dict_name= defaultdict(lambda: defaultdict(lambda: defaultdict(dict)))
for subcase in subcases:
for other_id in other_ids:
for element_id in element_ids:
if element_id in dict_name[subcase][other_id]:
# error duplicate element_id
pass
else:
dict_name[subcase][other_id][element_id]=0
return dict_name
def setdefaultfunc(subcases,other_ids,element_ids):
dict_name={}
for subcase in subcases:
for other_id in other_ids:
for element_id in element_ids:
if element_id in dict_name.setdefault(subcase,{}).setdefault(other_id,{}):
# error duplicate element_id
pass
else:
dict_name[subcase][other_id][element_id]=0
return dict_name
IPython input and output: IPython 输入和 output:
In [1]: from numpy.random import randint
In [2]: subcases,other_ids,element_ids=(randint(0,100,100),randint(0,100,100),randint(0,100,100))
In [5]: from collections import defaultdict
In [6]: defaultdictfunc(subcases,other_ids,element_ids)==setdefaultfunc(subcases,other_ids,element_ids)
Out[6]: True
In [7]: %timeit defaultdictfunc(subcases,other_ids,element_ids)
10 loops, best of 3: 177 ms per loop
In [8]: % timeit setdefaultfunc(subcases,other_ids,element_ids)
1 loops, best of 3: 351 ms per loop
Why is setdefaultfunc
slower.为什么setdefaultfunc
更慢。 I thought the underlying code would be the same.我认为底层代码是一样的。 Is there a way to improve its speed?有没有办法提高它的速度?
Thanks谢谢
According to user aneroid : 根据用户的无液状态 :
It would make sense that
defaultdict
is faster thatdict.setdefault()
since the former sets its default for the entire dict at creation time, whereas setdefault() does it per element when it is read. 可以认为defaultdict
快于dict.setdefault()
因为前者在创建时将整个dict设置为默认值,而setdefault()在读取时会按元素进行设置。 One reason to use setdefault is when the default you assign is based on the key (or something) rather than a generic default for the entire dict. 使用setdefault的一个原因是,您分配的默认值是基于键(或某物)的,而不是整个dict的通用默认值。
The setdefaultfunc
is worst because you call the dict constructor several times in the loop (since {}
is equivalent to dict()
), while defaultdict
avoid this by its own design. setdefaultfunc
最糟糕,因为您在循环中多次调用dict构造函数(因为{}
等于dict()
),而defaultdict
通过自己的设计避免了这种情况。
With a small change, you can easily improve the setdefaultfunc
: 进行很小的更改,就可以轻松改进setdefaultfunc
:
def setdefaultfunc2(subcases,other_ids,element_ids):
dict_name={}
for subcase in subcases:
subcase_dict = dict_name.setdefault(subcase,{})
for other_id in other_ids:
other_id_dict = subcase_dict.setdefault(other_id,{})
for element_id in element_ids:
if element_id in other_id_dict:
# error duplicate element_id
pass
else:
other_id_dict[element_id]=0
return dict_name
With this change, the results in my machine were: 进行此更改后,我的计算机中的结果为:
In [37]: defaultdictfunc(subcases,other_ids,element_ids)==setdefaultfunc2(subcases,other_ids,element_ids)
Out[37]: True
In [38]: %timeit defaultdictfunc(subcases,other_ids,element_ids)
286 ms ± 8.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [39]: %timeit setdefaultfunc(subcases,other_ids,element_ids)
434 ms ± 1.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [40]: %timeit setdefaultfunc2(subcases,other_ids,element_ids)
174 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
IMO, defaultdict
does not provide enought performance gain to make worth using it. IMO, defaultdict
无法提供足够的性能提升,因此值得使用。
val = 20_000_000
def defaultdict():
"""
defaultdict 1000000: 0.4460279941558838
defaultdict 10000000: 4.371468782424927
defaultdict 20000000: 8.807381391525269
"""
from collections import defaultdict
import time
a = defaultdict(list)
t = time.time()
for i in range(val):
key = i % (val / 2)
a[key].append(i)
print(f'defaultdict {val}:', time.time() - t)
def setdefault():
"""
setdefault 1000000: 0.3767530918121338
setdefault 10000000: 4.230009078979492
setdefault 20000000: 8.19938588142395
"""
import time
a = {}
t = time.time()
for i in range(val):
key = i % (val / 2)
a.setdefault(key, []).append(i)
print(f'setdefault {val}:', time.time() - t)
Python 3.10 Python 3.10
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.