简体   繁体   中英

Speed of retrieving dictionary items with try…except vs. key in dict?

I frequently need to check if a key is in a dictionary, then do something else if it is not. In python, there are two clear ways to do this:

if key in dict_:
   value = dict_[key]
   do_something
else:
   do_something_else

or

try:
    value = dict_[key]
except KeyError:
    do_something_else
else:
    do_something

Which of these is faster/preferable? Does it depend on the dictionary size?

It seems there may be two competing effects here: 1) having to search for a key twice, vs. 2) setting up an exception stack trace.

You can benchmark three different methods with timeit . get1 is the try...except , get2 uses the builtin .get , and get3 first checks.

In [1]: def get1(d, k): 
......:     try: 
......:         return d[k] 
......:     except KeyError: 
......:         return None 

In [3]: def get2(d, k): 
......:     return d.get(k) 

In [4]: def get3(d, k): 
......:    if k in d: return d[k]  
......:    else: return None 

On a small dictionary (100 elements)

In [8]: %timeit -n 100 [get1(little_d, e) for e in range(len(little_d))]                
18.8 µs ± 270 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [9]: %timeit -n 100 [get2(little_d, e) for e in range(len(little_d))]                
22.5 µs ± 352 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [10]: %timeit -n 100 [get3(little_d, e) for e in range(len(little_d))]               
19.3 µs ± 862 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

And a bigger one (1M elements)

In [11]: %timeit -n 100 [get1(big_d, e) for e in range(len(little_d))]
19.4 µs ± 469 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [12]: %timeit -n 100 [get2(big_d, e) for e in range(len(little_d))]
21.8 µs ± 241 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [13]: %timeit -n 100 [get3(big_d, e) for e in range(len(little_d))]
19.2 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

On a (1M) dictionary with ~50% "misses" ( random.choices(range(0, 2*len(big_id)), k=len(big_d)) ) shows more difference:

In [20]: %timeit -n 100 [get1(big_d, e) for e in choices]                              
514 ms ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [21]: %timeit -n 100 [get2(big_d, e) for e in choices]                              
416 ms ± 4.54 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [22]: %timeit -n 100 [get3(big_d, e) for e in choices]                              
367 ms ± 4.89 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

.get will still be faster in this case.

In [23]: %timeit -n 100 [big_id.get(e) for e in choices]                                
334 ms ± 3.6 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

They seem close, equally impacted by the size of the dictionary, but differently effected by misses. The only real impacts size is going to have is when python begins thrashing (as it probably? does here). An important note is that get2 is simply slower because the induced overhead of two function calls ( get2 and .get ) which are expensive-ish in python (as shown in the last test). The misses will result in slower results for a variety of reasons as @user2864740 points out.

tl;dr

I would use .get .


The answer will also largely depend on the speed of the __hash__ and __eq__ implementation of your keys. If __hash__ and __eq__ are slow, the differences of two calls to hash may show making method 1 or 2 (without the extra function) better.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM