简体   繁体   English

为什么dict比python中的if-else更快?

[英]Why is dict faster than if-else in python?

I tried to compare dict and if-else which is faster as follows. 我试图比较dict和if-else,它的速度更快,如下所示。

d = {
    str  : lambda x: "'%s'" % str(x),
    int  : lambda x: str(x),
    float: lambda x: str(x),
}
items = ['a', 'b', 'c', 1, 2, 3, 4, 5, 1.0]

def use_dict():
    r = []
    for i in items:
        r.append(d[type(i)](i))
    return r

def use_if():
    r = []
    for i in items:
        if isinstance(i, str):
            r.append("'%s'" % str(i))
        elif isinstance(i, (int, float)):
            r.append(str(i))
    return r

if __name__ == '__main__':

    from timeit import timeit

    print 'use_dict:', timeit(use_dict)
    # -> use_dict: 9.21109666657

    print 'use_if  :', timeit(use_if)
    # -> use_if  : 10.9568739652

I found dict is faster than if-else. 我发现dict比if-else快。 This means when I want to write a switch-statement, dict is better solution. 这意味着当我要编写切换语句时,dict是更好的解决方案。 But I have a doubt Why dict is faster? 但我怀疑为什么字典更快? Anyone can explain it. 任何人都可以解释。 Thanks. 谢谢。

I am guessing it would be because for cases where the element in items is not a string, The if..elif solution actually ends up calling isinstance() function twice, which may be adding to the cost. 我猜想这是因为对于items中的元素不是字符串的情况, if..elif解决方案实际上最终调用了isinstance()函数两次,这可能会增加成本。 And function calls in python are costly. 而且python中的函数调用非常昂贵。

Whereas your dict solution only calls type() once in all cases. 而您的dict解决方案在所有情况下仅调用一次type()

As an example , I converted the list of items to all strings, and the if..elif solution was faster that time - 举例来说,我将items列表转换为所有字符串,而if..elif解决方案在那时候更快-

d = {
    str  : lambda x: "'%s'" % str(x),
    int  : lambda x: str(x),
    float: lambda x: str(x),
}
items1 = ['a', 'b', 'c', 1, 2, 3, 4, 5, 1.0]
items = ['a','b','c','d','e','f','g','h','i','j']

def use_dict():
    r = []
    for i in items:
        r.append(d[type(i)](i))
    return r

def use_if():
    r = []
    for i in items:
        if isinstance(i, str):
            r.append("'%s'" % str(i))
        elif isinstance(i, (int, float)):
            r.append(str(i))
    return r

if __name__ == '__main__':

    from timeit import timeit

    print('use_dict:', timeit(use_dict))

    print('use_if  :', timeit(use_if))

Result of running it on all strings - 在所有字符串上运行它的结果-

C:\Users\anandsk>python a.py
use_dict: 7.891252114975529
use_if  : 6.850442551614534

If you want to get an idea of how your code executes, take a look at the dis module. 如果您想了解代码的执行方式,请查看dis模块。

A quick example... 一个简单的例子...

import dis

# Here are the things we might want to do
def do_something_a():
    print 'I did a'


def do_something_b():
    print 'I did b'


def do_something_c():
    print 'I did c'


# Case 1
def f1(x):
    if x == 1:
        do_something_a()
    elif x == 2:
        do_something_b()
    elif x == 3:
        do_something_c()


# Case 2
FUNC_MAP = {1: do_something_a, 2: do_something_b, 3: do_something_c}
def f2(x):
    FUNC_MAP[x]()


# Show how the functions execute
print 'Case 1'
dis.dis(f1)
print '\n\nCase 2'
dis.dis(f2)

...which outputs... ...输出...

Case 1
 18           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               1 (1)
              6 COMPARE_OP               2 (==)
              9 POP_JUMP_IF_FALSE       22

 19          12 LOAD_GLOBAL              0 (do_something_a)
             15 CALL_FUNCTION            0
             18 POP_TOP
             19 JUMP_FORWARD            44 (to 66)

 20     >>   22 LOAD_FAST                0 (x)
             25 LOAD_CONST               2 (2)
             28 COMPARE_OP               2 (==)
             31 POP_JUMP_IF_FALSE       44

 21          34 LOAD_GLOBAL              1 (do_something_b)
             37 CALL_FUNCTION            0
             40 POP_TOP
             41 JUMP_FORWARD            22 (to 66)

 22     >>   44 LOAD_FAST                0 (x)
             47 LOAD_CONST               3 (3)
             50 COMPARE_OP               2 (==)
             53 POP_JUMP_IF_FALSE       66

 23          56 LOAD_GLOBAL              2 (do_something_c)
             59 CALL_FUNCTION            0
             62 POP_TOP
             63 JUMP_FORWARD             0 (to 66)
        >>   66 LOAD_CONST               0 (None)
             69 RETURN_VALUE


Case 2
 29           0 LOAD_GLOBAL              0 (FUNC_MAP)
              3 LOAD_FAST                0 (x)
              6 BINARY_SUBSCR
              7 CALL_FUNCTION            0
             10 POP_TOP
             11 LOAD_CONST               0 (None)
             14 RETURN_VALUE

...so it's pretty easy to see which function has to execute the most instructions. ...因此,很容易看出哪个函数必须执行最多的指令。

As for which is actually faster, that's something you'd have to check by profiling the code. 至于哪个实际上更快,那是您必须通过分析代码来检查的事情。

The if/elif/else structure compares the key it was given to a sequence of possible values one by one until it finds a match in the condition of some if statement, then reads what it is supposed to execute from inside the if block. if / elif / else结构将给出的键与可能值序列一一比较,直到在某些if语句的条件下找到匹配项为止,然后从if块内部读取应该执行的键。 This can take a long time, because so many checks ( n/2 on average, for n possible values) have to be made for every lookup. 这可能会花费很长时间,因为每次查找都要进行如此多的检查(对于n可能的值,平均为n n/2 )。

The reason that a sequence of if statements is more difficult to optimize than a switch statement is that the condition checks (what's inside the parens in C++) might conceivably change the state of some variable that's involved in the next check, so you have to do them in order. 一系列if语句比switch语句更难于优化的原因是,条件检查(C ++中的括号内的内容)可能会改变下一次检查所涉及的某些变量的状态,因此您必须执行他们按顺序。 The restrictions on switch statements remove that possibility, so the order doesn't matter (I think). 对switch语句的限制消除了这种可能性,因此顺序无关紧要(我认为)。

Python dictionaries are implemented as hash tables . Python字典实现为哈希表 The idea is this: if you could deal with arbitrarily large numbers and had infinite RAM, you could create a huge array of function pointers that is indexed just by casting whatever your lookup value is to an integer and using that as the index. 想法是这样的:如果您可以处理任意大的数字并具有无限的RAM,则可以创建庞大的函数指针数组,只需将查找值转换为整数并将其用作索引就可以对其进行索引。 Lookup would be virtually instantaneous. 查找实际上是瞬时的。

You can't do that, of course, but you can create an array of some manageable length, pass the lookup value to a hash function (which generates some integer, depending on the lookup value), then % your result with the length of your array to get an index within the bounds of that array. 当然,您不能这样做,但是您可以创建一个可管理长度的数组,将查找值传递给哈希函数 (根据查找值生成一个整数),然后将结果的长度为%您的数组以获取该数组范围内的索引。 That way, lookup takes as much time as is needed to call the hash function once, take the modulus, and jump to an index. 这样,查找所需的时间与一次调用哈希函数,获取模数并跳转到索引所需的时间相同。 If the amount of different possible lookup values is large enough, the overhead of the hash function becomes negligible compared to those n/2 condition checks. 如果不同的可能查找值的数量足够大,则与那些n / 2条件检查相比,哈希函数的开销可以忽略不计。

(Actually, since many different lookup values will inevitably map to the same index, it's not quite that simple. You have to check for and resolve possible conflicts, which can be done in a number of ways. Still, the gist of it is as described above.) (实际上,由于许多不同的查找值将不可避免地映射到相同的索引,所以并不是那么简单。您必须检查并解决可能的冲突,这可以通过多种方法来完成。如上所述。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM