简体   繁体   中英

Why List is faster than Dict on the same python code?

I've wrote a function about dynamic programming.

The recursion formula is

T(n) = T(0) * T(n-1) + T(1) * T(n-2) + … + T(n-1) * T(0)

As you can see, the value of T(n) depends on the values of T(0) … T(n-1) .

In this problem, I need to store T(0) … T(n-1) for calculate T(n) .

But which data structure is the best?

Assume we have finished calculate T(0) … T(5) . we need to calculate T(6)

We can store T in the following structure:

T = [1,1,2,5,14,42,0]

T = {0:1,1:1,2:2,3:5,4:14,5:42,6:0}

My answer is dict at first, because the time complexity of getting T(k) is O(1) .

However after test both of list and dict . The test result shows that list is faster than dict . Why???

I use n = 1000 to test the program.

import timeit
def test(n, T):
    T[0] = 1
    # calculate T[i]
    # we need to calculate T[0]-> T[n-1] at first.
    for i in range(1,n+1): 
        for j in range(i):
            T[i] += T[j]*T[i-1-j]
    return T[n]

# initial list T
T_1 = [0]*1001 

# initial dict T
T_2 = {} 
for i in range(1001):
    T_2[i] = 0

t = timeit.timeit(stmt="test(1000,T_1)",setup="from __main__ import test,T_1;",number=10)
print("store T with list, total time is:",t)
t = timeit.timeit(stmt="test(1000,T_2)",setup="from __main__ import test,T_2;",number=10)
print("store T with dict, total time is:",t)

The running results are:

store T with list, total time is: 6.454328614287078

store T with dict, total time is: 6.761199993081391

Thanks for your help.

TLDR: Dictionaries use hashing to look up a value which adds some overhead. There is also a probability of collision which costs a little more performance to resolve.

Long answer:

Hashing:
Dictionary is implemented as a hashtable, which is a data structure that stores values in an array internally. It determines which index to use by passing the key to a hashing function. The hashing function will produce a value within the range of the internal array. This is a relatively quick way to look up an item by key instead of index. But it's still slower than looking up directly by index due to the requirement to run this hashing function each time.

Collisions:
Dictionaries cannot perfectly avoid collisions in most cases. The internal array can either be implemented as an array of linked lists or another technique may be used to resolve collisions. It is possible to avoid collisions if the data set changes slowly, or never changes; create a perfect hash function for the given data set. There is no universal perfect hash function for all data sets, it is not possible. So generalized dictionaries such as the one provided in Python must implement collision resolution.

Which data structure is better? It depends on how your data is mapped.

If you can map it to consecutive integer keys (eg. 0, 1, 2, 3, 4, 5, etc...) with very few gaps, then an array (list in python) may be the best option.

If your data set has non-integer keys, a dictionary is the best option. This is what it was designed for.

If you have integer keys with large gaps, a dictionary will save a lot of memory compared to a list, since the list would have to contain a lot of wasted indices.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM