What would be the fastest way to increment counters stored in a dictionary?
Because I have to do this same operation hundreds of thousand of times, I'm looking for something more efficient than what I have below:
def funcA(a):
keys = [x for x in range(1, 51)]
adict = {key: 0 for key in keys}
for k in adict.keys(): # this is the code I would like to improve
if k <= a:
adict[k] += 1
else:
break
import timeit
number = 100000
t1 = timeit.timeit(
'funcA(5)',
setup="from __main__ import funcA", number=number)
print(t1)
>>> 0.42629639082588255
Trying to use a list comprehension instead seems to slow down everything a bit, maybe because it's lacking the break
statement?
def funcB(a):
# not working, invalid syntax
keys = [x for x in range(1, 51)]
adict = {key: 0 for key in keys}
def _inc(x):
x += 1
return x
[_inc(adict[k]) for k in adict.keys() if k <= a]
# Timing: 0.5831785711925477
Note: initially I had if float(k) <= float(a):
but since I'm only expecting numbers (integers or floats), removing the float()
conversion improved the code. Is this assumption reasonable?
Note2: as noted in several comments, the break
statement can give unexpected results in the resulting dictionary, so is better to just do:
def funcA(a):
keys = [x for x in range(1, 51)]
adict = {key: 0 for key in keys}
for k in adict:
if k <= a:
adict[k] += 1
# Timing: 0.5132114209700376
In your case you could just use the fact that booleans (the result of the comparison) can be simply converted to integers. It may not be the fastest but it's definitely short and "relatively" fast:
def funcA(a):
adict = {key: int(key <= a) for key in range(1, 51)}
This is assuming that the second function is actually what you want because the first one could give different results because of the break
. Dictionaries are unordered so it could not increment some values for keys smaller or equal to a
. Also it doesn't increment the values, it just sets them to 1
or 0
because you actually don't need addition in this case.
However, that's not necessarily the fastest way because it has to do a lot of functions calls and int
lookups. So I'll present some more equivalent operations in order of performance (fastest to slowest):
def cached_version():
range_cache = range(1, 51)
cache = dict.fromkeys(range_cache, 0)
def inner(a):
adict = cache.copy()
for key in range_cache[:a]: # requires a to be an integer!
adict[key] = 1
return adict
return inner
func1 = cached_version() # initialize cache
def func2(a):
keys = range(1, 51)
adict = dict.fromkeys(keys[:a], 1) # requires a to be an integer!
for key in keys[a:]:
adict[key] = 0
return adict
def func3(a):
adict = {}
for key in range(1, 51):
if key <= a:
adict[key] = 1
else:
adict[key] = 0
return adict
def func4(a):
return {key: 1 if key <= a else 0 for key in range(1, 51)}
def func5(a):
keys = range(1, 51)
adict = dict.fromkeys(keys[:a], 1) # requires a to be an integer!
adict.update(dict.fromkeys(keys[a:], 0))
return adict
def func6(a):
return dict(zip(range(1, 51), [1]*a + [0]*(49-a))) # requires a to be an integer!
from itertools import chain
def func7(a):
return dict(zip(range(1, 51), chain([1]*a, [0]*(49-a)))) # requires a to be an integer!
def func8(a): # the one I originally mentioned
adict = {key: int(key <= a) for key in range(1, 51)}
The timings were done on Python 3.5, Windows 10, there could be differences on other machines and other Python versions. Also note that the performance could be totally different if you had more keys instead of just range(1, 51)
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.