简体   繁体   中英

Why is iterating over a dict so slow?

I have a script that does a lot of dict deletions and eventually iterates over it.

I've managed to reduce it to a simple benchmark:

> py -m timeit -s "a = {i:i for i in range(10000000)};[a.pop(i) for i in range(10000000-1)]" "next(iter(a))"
10 loops, best of 5: 30.8 msec per loop

How come iterating over a single key after I've deleted all previous values becomes slow?

Since 3.6, Python dictionaries work with an internal hash table and an array of entries .

When a key is removed from the dictionary, its entry is actually replaced in the array with a dummy value marking the entry as deleted.

Upon iteration, it skips all of these dummy values one by one, until it finds the next real item.

That's why if you'll skip the first value, and remove only the rest, you'll see the iteration is as fast as iterating over a single item dictionary:

> py -m timeit -s "a = {i:i for i in range(10000000)};[a.pop(i) for i in range(1,10000000-1)]" "next(iter(a))"
1000000 loops, best of 5: 219 nsec per loop

For more information about the internal dictionary structure, you may see this wonderful answer .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM