简体   繁体   中英

Does the 'in' clause used on python dictionaries call the keys() function every time?

Let's say I have a

dict = {...} #lots of words in dictionary  

I have to do a

for word in ...: #long list of words
    if word in dict:
        #do something

My question is, does the 'if word in dict' call the dict.keys() function every time and hence a lot slower than if I added another variable at the top that is dict_keys = dict.keys()? The result of what I am talking about would be this.

dict = {...}
dict_keys = dict.keys()

for word in ...:
    if word in dict_keys:
        #do something

Thanks

no. foo in mydict is in fact a lot faster than foo in keys_list , since dict s are hash tables, so finding the element inside it is O(1) . While foo in keys_list would be O(n) (slower as the number of keys grows bigger)

But you could always test yourself:

$ python -m timeit -s "x = range(1000)" "15 in x"
1000000 loops, best of 3: 0.311 usec per loop
$ python -m timeit -s "x = dict.fromkeys(xrange(1000))" "15 in x"
10000000 loops, best of 3: 0.0515 usec per loop

So checking directly on the dict for 1000 elements is one order of magnitude faster, not considering the time of the .keys()

Actually, builtin dicts just call their __contains__ which directly invokes the C function dict_has_key which is very fast. So, doing it the first way is better because it doesn't force you to evaluate the entire sequence of keys within your dict as calling keys() does.

简答:不,更像__contains__ ,这是O(1)(即便宜)。

It sounds like you're actually trying to iterate over the intersection of the set of words in "# long list of words" and the set of keys in the dictionary. So use the built-in sets and its overloading of the & operator for intersections:

for word in set(some_dict) & set(wordlist):
    # do something

Just based on simple experimentation with code like:

_w= ['a', 'b', 'c', 'd', 'e']
d= {}
w= []
for k in xrange(123):
    for word in _w:
        wk= word+ str(k)
        d[wk]= k
        w.append(wk)

def m1():
    for word in w:
        if word in d:
            pass

def m2(n= 1):
    dk= d.keys()
    for word in w:
        if word in dk:
            pass

and timings:

In []: len(w)
Out[]: 5
In []: %timeit m1()
1000000 loops, best of 3: 657 ns per loop
In []: %timeit m2()
1000000 loops, best of 3: 1.55 us per loop

In []: len(w)
Out[]: 615
In []: %timeit m1()
10000 loops, best of 3: 49.2 us per loop
In []: %timeit m2()
100 loops, best of 3: 5.62 ms per loop

The conclusion will be that m1() is a clear winner (as expected ;-).

OK, this is not really a proof (in a strict sense) of the superiority of m1() . There still exist a slight chance that someone will figure out a set of keys where m1() and m2() execution times are near each other (but I can't figure out any case where m1() would be actually much worse than m2() ). In practice m1() approach allways wins. It's allways engouraged to try out first hard facts of the alternative approaches, then if something contradicts your expectations you'll be more prepared to find why it happens.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM