简体   繁体   中英

What is faster: adding a key-value pair or checking for the existence of a key?

Imagine a CSV file with 3 columns: individual name, group name, group ID. Obviously column 1 is different for every line while column 2 and 3 can be the same as before (every group name has an individual ID though). This is not sorted in any way.

For reasons I'm creating a dict to save: group ID (key) --> group name (value). Now what is faster of the following variants?

  1. checking if that key already exists and only saving if not.

     if ID not in group_dict: group_dict[ID] = name
  2. just saving it every time again (replacing the value, which is the same anyway).

     group_dict[ID] = name

It's really best to profile the code when you have a question like this. Python provides the timeit module, which is useful for this purpose. Here is some code you can use to experiment with,

import timeit

setup_code = """
import random

keysize = 20
valsize = 32
store = dict()
data = [(random.randint(0, 2**keysize), random.randint(0, 2**valsize)) for _ in range(1000000)]

"""

query = """
for key, val in data:
    if key not in store:
        store[key] = val
"""

no_query = """
for key, val in data:
    store[key] = val
"""


if __name__ == "__main__":
    print(timeit.timeit(stmt=query, setup=setup_code, number=1))
    print(timeit.timeit(stmt=no_query, setup=setup_code, number=1))

The performance of the code depends upon the number of key collisions. In this code, if you increase keysize you will have fewer collisions and checking the dict first will be slower. Conversely, if you reduce the keysize the number of collisions will increase and checking the dict starts to perform better. The take away here is that the number of collision you have will determine which of these approaches is preferable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM