简体   繁体   English

setdefault vs defaultdict 性能

[英]setdefault vs defaultdict performance

I am writing code for an application where performance is important.我正在为性能很重要的应用程序编写代码。 I am wondering why defaultdict seems to be faster then setdefault .我想知道为什么defaultdict似乎比setdefault更快。

I would like to be able to use setdefault , mostly because i do not like the print output of the nested defaultdict (see implementation below).我希望能够使用setdefault ,主要是因为我不喜欢嵌套defaultdict的打印 output (参见下面的实现)。

In my code, i need to test if element_id is already a key of the dict.在我的代码中,我需要测试element_id是否已经是字典的键。

Here are the two functions that i am testing:这是我正在测试的两个功能:

def defaultdictfunc(subcases,other_ids,element_ids):
    dict_name= defaultdict(lambda: defaultdict(lambda: defaultdict(dict)))
    for subcase in subcases:
        for other_id in other_ids:
            for element_id in element_ids: 
                if element_id in dict_name[subcase][other_id]:
                    # error duplicate element_id
                    pass
                else:
                    dict_name[subcase][other_id][element_id]=0
    return dict_name

def setdefaultfunc(subcases,other_ids,element_ids):
    dict_name={}
    for subcase in subcases:
        for other_id in other_ids:
            for element_id in element_ids: 
                if element_id in dict_name.setdefault(subcase,{}).setdefault(other_id,{}):
                    # error duplicate element_id
                    pass
                else:
                    dict_name[subcase][other_id][element_id]=0

    return dict_name

IPython input and output: IPython 输入和 output:

In [1]: from numpy.random import randint

In [2]: subcases,other_ids,element_ids=(randint(0,100,100),randint(0,100,100),randint(0,100,100))

In [5]: from collections import defaultdict

In [6]: defaultdictfunc(subcases,other_ids,element_ids)==setdefaultfunc(subcases,other_ids,element_ids)
Out[6]: True

In [7]: %timeit defaultdictfunc(subcases,other_ids,element_ids)
10 loops, best of 3: 177 ms per loop

In [8]: % timeit setdefaultfunc(subcases,other_ids,element_ids)
1 loops, best of 3: 351 ms per loop

Why is setdefaultfunc slower.为什么setdefaultfunc更慢。 I thought the underlying code would be the same.我认为底层代码是一样的。 Is there a way to improve its speed?有没有办法提高它的速度?

Thanks谢谢

According to user aneroid : 根据用户的无液状态

It would make sense that defaultdict is faster that dict.setdefault() since the former sets its default for the entire dict at creation time, whereas setdefault() does it per element when it is read. 可以认为defaultdict快于dict.setdefault()因为前者在创建时将整个dict设置为默认值,而setdefault()在读取时会按元素进行设置。 One reason to use setdefault is when the default you assign is based on the key (or something) rather than a generic default for the entire dict. 使用setdefault的一个原因是,您分配的默认值是基于键(或某物)的,而不是整个dict的通用默认值。

The setdefaultfunc is worst because you call the dict constructor several times in the loop (since {} is equivalent to dict() ), while defaultdict avoid this by its own design. setdefaultfunc最糟糕,因为您在循环中多次调用dict构造函数(因为{}等于dict() ),而defaultdict通过自己的设计避免了这种情况。

With a small change, you can easily improve the setdefaultfunc : 进行很小的更改,就可以轻松改进setdefaultfunc

def setdefaultfunc2(subcases,other_ids,element_ids):
    dict_name={}
    for subcase in subcases:
        subcase_dict = dict_name.setdefault(subcase,{})
        for other_id in other_ids:
            other_id_dict = subcase_dict.setdefault(other_id,{})
            for element_id in element_ids: 
                if element_id in other_id_dict:
                    # error duplicate element_id
                    pass
                else:
                    other_id_dict[element_id]=0
    return dict_name

With this change, the results in my machine were: 进行此更改后,我的计算机中的结果为:

In [37]: defaultdictfunc(subcases,other_ids,element_ids)==setdefaultfunc2(subcases,other_ids,element_ids)
Out[37]: True

In [38]: %timeit defaultdictfunc(subcases,other_ids,element_ids)
286 ms ± 8.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [39]: %timeit setdefaultfunc(subcases,other_ids,element_ids)
434 ms ± 1.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [40]: %timeit setdefaultfunc2(subcases,other_ids,element_ids)
174 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

IMO, defaultdict does not provide enought performance gain to make worth using it. IMO, defaultdict无法提供足够的性能提升,因此值得使用。

val = 20_000_000


def defaultdict():
    """
    defaultdict 1000000: 0.4460279941558838
    defaultdict 10000000: 4.371468782424927
    defaultdict 20000000: 8.807381391525269
    """
    from collections import defaultdict
    import time
    a = defaultdict(list)
    t = time.time()
    for i in range(val):
        key = i % (val / 2)
        a[key].append(i)
    print(f'defaultdict {val}:', time.time() - t)


def setdefault():
    """
    setdefault 1000000: 0.3767530918121338
    setdefault 10000000: 4.230009078979492
    setdefault 20000000: 8.19938588142395
    """
    import time
    a = {}
    t = time.time()
    for i in range(val):
        key = i % (val / 2)
        a.setdefault(key, []).append(i)
    print(f'setdefault {val}:', time.time() - t)

Python 3.10 Python 3.10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM