简体   繁体   中英

Python performance: Try-except or not in?

In one of my classes I have a number of methods that all draw values from the same dictionaries. However, if one of the methods tries to access a value that isn't there, it has to call another method to make the value associated with that key.

I currently have this implemented as follows, where findCrackDepth(tonnage) assigns a value to self.lowCrackDepth[tonnage].

if tonnage not in self.lowCrackDepth:
    self.findCrackDepth(tonnage)
lcrack = self.lowCrackDepth[tonnage]

However, it would also be possible for me to do this as

try:
    lcrack = self.lowCrackDepth[tonnage]
except KeyError:
    self.findCrackDepth(tonnage)
    lcrack = self.lowCrackDepth[tonnage]

I assume there is a performance difference between the two related to how often the values is already in the dictionary. How big is this difference? I'm generating a few million such values (spread across a many dictionaries in many instances of the class), and for each time the value doesn't exist, there are probably two times where it does.

It's a delicate problem to time this because you need care to avoid "lasting side effects" and the performance tradeoff depends on the % of missing keys. So, consider a dil.py file as follows:

def make(percentmissing):
  global d
  d = dict.fromkeys(range(100-percentmissing), 1)

def addit(d, k):
  d[k] = k

def with_in():
  dc = d.copy()
  for k in range(100):
    if k not in dc:
      addit(dc, k)
    lc = dc[k]

def with_ex():
  dc = d.copy()
  for k in range(100):
    try: lc = dc[k]
    except KeyError:
      addit(dc, k)
      lc = dc[k]

def with_ge():
  dc = d.copy()
  for k in range(100):
    lc = dc.get(k)
    if lc is None:
      addit(dc, k)
      lc = dc[k]

and a series of timeit calls such as:

$ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_in()'
10000 loops, best of 3: 28 usec per loop
$ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_ex()'
10000 loops, best of 3: 41.7 usec per loop
$ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_ge()'
10000 loops, best of 3: 46.6 usec per loop

this shows that, with 10% missing keys, the in check is substantially the fastest way.

$ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_in()'
10000 loops, best of 3: 24.6 usec per loop
$ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_ex()'
10000 loops, best of 3: 23.4 usec per loop
$ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_ge()'
10000 loops, best of 3: 42.7 usec per loop

with just 1% missing keys, the exception approach is marginally fastest (and the get approach remains the slowest one in either case).

So, for optimal performance, unless the vast majority (99%+) of lookups is going to succeed, the in approach is preferable.

Of course, there's another, elegant possibility: adding a dict subclass like...:

class dd(dict):
   def __init__(self, *a, **k):
     dict.__init__(self, *a, **k)
   def __missing__(self, k):
     addit(self, k)
     return self[k]

def with_dd():
  dc = dd(d)
  for k in range(100):
    lc = dc[k]

However...:

$ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_dd()'
10000 loops, best of 3: 46.1 usec per loop
$ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_dd()'
10000 loops, best of 3: 55 usec per loop

...while slick indeed, this is not a performance winner -- it's about even with the get approach, or slower, just with much nicer-looking code to use it. ( defaultdict , semantically analogous to this dd class, would be a performance win if it was applicable, but that's because the __missing__ special method, in that case, is implemented in well optimized C code).

Checking if a key exists is cheaper or at least as cheap as retrieving it. So use the if not in solution which is much cleaner and more readable.

According to your question a key not existing is not an error-like case so there's no good reason to let python raise an exception (even though you catch it immediately), and if you have a if not in check, everyone knows your intention - to get the existing value or otherwise generate it.

When in doubt, profile.

Run a test to see if, in your environment, one runs faster than another.

If it is exceptional, use an exception. If you expect the key to be in there, use try/except, if you don't know whether the key is in there, use not in .

I believe the .get() method of a dict has a parameter for setting the default value. You could use that and have it in one line. I'm not sure how it affects performance though.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM