Which of these is faster, and why? Or are they the same? Does the answer vary by any conditions (size of dictionary, type of data, etc.)?
Traditional:
for key in dict:
x = dict[key]
x = key
Hipster:
for key, value in dict.items():
y = value
y = key
I haven't seen an exact duplicate, but if there is one I'd be happy to be pointed to it.
It turns out there are actually orders of magnitude of difference.
I don't know much about performance testing, but what I tried to do was create 3 dicts of varying sizes, with each smaller dict being a subset of the larger dict. I then ran all three dicts through the two functions (Traditional vs. Hipster). Then I did that 100 times.
The dictionary sizes (number of key-value pairs) for dict1, dict2, and dict3 are 1000, 50000, 500000 respectively.
There seems to be a significant difference, with d.items()
being generally faster and d.items()
being WAY faster on smaller dictionaries. This is in line with expectations (Python generally rewarding "pythonic" code).
Results:
--d[key]--
dict1 -- mean: 0.0001113555802294286, st. dev: 1.9951038526222054e-05
dict2 -- mean: 0.01669296698019025, st. dev: 0.019088713496142
dict3 -- mean: 0.2553815016898443, st. dev: 0.02778986771642094
--d.items()--
dict1 -- mean: 6.005059978633653e-05, st. dev: 1.1960199272812617e-05
dict2 -- mean: 0.00507106617995305, st. dev: 0.009871762371401046
dict3 -- mean: 0.07369932165958744, st. dev: 0.023440325168927384
Code ( repl.it ) providing results:
import timeit
import random
import statistics
def traditional(dicty):
for key in dicty:
x = dicty[key]
x = key
def hipster(dicty):
for key, value in dicty.items():
y = value
y = key
def generate_random_dicts():
random_dict1, random_dict2, random_dict3 = {}, {}, {}
for _ in range(1000):
key = generate_random_str_one_to_ten_chars()
val = generate_random_str_one_to_ten_chars()
random_dict1[key] = val
random_dict2[key] = val
random_dict3[key] = val
for _ in range(49000):
key = generate_random_str_one_to_ten_chars()
val = generate_random_str_one_to_ten_chars()
random_dict2[key] = val
random_dict3[key] = val
for _ in range(450000):
key = generate_random_str_one_to_ten_chars()
val = generate_random_str_one_to_ten_chars()
random_dict3[key] = val
return [random_dict1, random_dict2, random_dict3]
def generate_random_str_one_to_ten_chars():
ret_str = ""
for x in range(random.randrange(1,10,1)):
ret_str += chr(random.randrange(40,126,1))
return ret_str
dict1, dict2, dict3 = generate_random_dicts()
test_dicts = [dict1, dict2, dict3]
times = {}
times['traditional_times'] = {}
times['hipster_times'] = {}
for _ in range(100):
for itr, dictx in enumerate(test_dicts):
start = timeit.default_timer()
traditional(dictx)
end = timeit.default_timer()
time = end - start
try:
times['traditional_times'][f"dict{itr+1}"].append(time)
except KeyError:
times['traditional_times'][f"dict{itr+1}"] = [time]
start = timeit.default_timer()
hipster(dictx)
end = timeit.default_timer()
time = end - start
try:
times['hipster_times'][f"dict{itr+1}"].append(time)
except KeyError:
times['hipster_times'][f"dict{itr+1}"] = [time]
print("--d[key]--")
for x in times['traditional_times'].keys():
ltimes = times['traditional_times'][x]
mean = statistics.mean(ltimes)
stdev = statistics.stdev(ltimes)
print(f"{x} -- mean: {mean}, st. dev: {stdev}\n\n")
print("--d.items()--")
for x in times['hipster_times'].keys():
ltimes = times['hipster_times'][x]
mean = statistics.mean(ltimes)
stdev = statistics.stdev(ltimes)
print(f"{x} -- mean: {mean}, st. dev: {stdev}")
This code only needs to go through the dictionary once to retrieve everything from it:
for key, value in dict.items():
This code goes through the whole dictionary once, but retrieves only keys:
for key in dict:
x = dict[key]
Then, for each key, it has to go into the dictionary again to look up the value. So, it has to be slower.
Still, the whole thing is purely academic and of no real significance in real life. When your application starts being too slow, it is really very very unlikely that the slowness will be caused by the way you iterate through a dictionary.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.