Exceptions catching performance in python

Question

I know exceptions in python are fast when it comes to the try but that it may be expensive when it comes to the catch.

Does this mean that:

try:
   some code
except MyException:
   pass

is faster than this ?

try:
   some code
except MyException as e:
   pass

Answer 1

In addition to Francesco's answer, it seems that one of the (relatively) expensive part of the catch is the exception matching:

>>> timeit.timeit('try:\n    raise KeyError\nexcept KeyError:\n    pass', number=1000000 )
1.1587663322268327
>>> timeit.timeit('try:\n    raise KeyError\nexcept:\n    pass', number=1000000 )
0.9180641582179874

Looking at the (CPython 2) disassembly:

>>> def f():
...     try:
...         raise KeyError
...     except KeyError:
...         pass
... 
>>> def g():
...     try:
...         raise KeyError
...     except:
...         pass
... 
>>> dis.dis(f)
  2           0 SETUP_EXCEPT            10 (to 13)

  3           3 LOAD_GLOBAL              0 (KeyError)
              6 RAISE_VARARGS            1
              9 POP_BLOCK           
             10 JUMP_FORWARD            17 (to 30)

  4     >>   13 DUP_TOP             
             14 LOAD_GLOBAL              0 (KeyError)
             17 COMPARE_OP              10 (exception match)
             20 POP_JUMP_IF_FALSE       29
             23 POP_TOP             
             24 POP_TOP             
             25 POP_TOP             

  5          26 JUMP_FORWARD             1 (to 30)
        >>   29 END_FINALLY         
        >>   30 LOAD_CONST               0 (None)
             33 RETURN_VALUE        
>>> dis.dis(g)
  2           0 SETUP_EXCEPT            10 (to 13)

  3           3 LOAD_GLOBAL              0 (KeyError)
              6 RAISE_VARARGS            1
              9 POP_BLOCK           
             10 JUMP_FORWARD             7 (to 20)

  4     >>   13 POP_TOP             
             14 POP_TOP             
             15 POP_TOP             

  5          16 JUMP_FORWARD             1 (to 20)
             19 END_FINALLY         
        >>   20 LOAD_CONST               0 (None)
             23 RETURN_VALUE

Note that the catch block loads the Exception anyway and matches it against a KeyError . Indeed, looking at the except KeyError as ke case:

>>> def f2():
...     try:
...         raise KeyError
...     except KeyError as ke:
...         pass
... 
>>> dis.dis(f2)
  2           0 SETUP_EXCEPT            10 (to 13)

  3           3 LOAD_GLOBAL              0 (KeyError)
              6 RAISE_VARARGS            1
              9 POP_BLOCK           
             10 JUMP_FORWARD            19 (to 32)

  4     >>   13 DUP_TOP             
             14 LOAD_GLOBAL              0 (KeyError)
             17 COMPARE_OP              10 (exception match)
             20 POP_JUMP_IF_FALSE       31
             23 POP_TOP             
             24 STORE_FAST               0 (ke)
             27 POP_TOP             

  5          28 JUMP_FORWARD             1 (to 32)
        >>   31 END_FINALLY         
        >>   32 LOAD_CONST               0 (None)
             35 RETURN_VALUE

The only difference is a single STORE_FAST to store the exception value (in case of a match). Similarly, having several exception matches:

>>> def f():
...     try:
...         raise ValueError
...     except KeyError:
...         pass
...     except IOError:
...         pass
...     except SomeOtherError:
...         pass
...     except:
...         pass
... 
>>> dis.dis(f)
  2           0 SETUP_EXCEPT            10 (to 13)

  3           3 LOAD_GLOBAL              0 (ValueError)
              6 RAISE_VARARGS            1
              9 POP_BLOCK           
             10 JUMP_FORWARD            55 (to 68)

  4     >>   13 DUP_TOP             
             14 LOAD_GLOBAL              1 (KeyError)
             17 COMPARE_OP              10 (exception match)
             20 POP_JUMP_IF_FALSE       29
             23 POP_TOP             
             24 POP_TOP             
             25 POP_TOP             

  5          26 JUMP_FORWARD            39 (to 68)

  6     >>   29 DUP_TOP             
             30 LOAD_GLOBAL              2 (IOError)
             33 COMPARE_OP              10 (exception match)
             36 POP_JUMP_IF_FALSE       45
             39 POP_TOP             
             40 POP_TOP             
             41 POP_TOP             

  7          42 JUMP_FORWARD            23 (to 68)

  8     >>   45 DUP_TOP             
             46 LOAD_GLOBAL              3 (SomeOtherError)
             49 COMPARE_OP              10 (exception match)
             52 POP_JUMP_IF_FALSE       61
             55 POP_TOP             
             56 POP_TOP             
             57 POP_TOP             

  9          58 JUMP_FORWARD             7 (to 68)

 10     >>   61 POP_TOP             
             62 POP_TOP             
             63 POP_TOP             

 11          64 JUMP_FORWARD             1 (to 68)
             67 END_FINALLY         
        >>   68 LOAD_CONST               0 (None)
             71 RETURN_VALUE

Will duplicate the exception and try to match it against every exception listed, one by one until it founds a match, which is (probably) what is being hinted at as 'poor catch performance'.

Answer 2

I think the two are the same in terms of speed:

>>> timeit.timeit('try:\n    raise KeyError\nexcept KeyError:\n    pass', number=1000000 )
0.7168641227143269
>>> timeit.timeit('try:\n    raise KeyError\nexcept KeyError as e:\n    pass', number=1000000 )
0.7733279216613766

Answer 3

The catch isn't expensive, the parts that appear relatively slow are the creation of the stack trace itself and if required the subsequent unwinding of the stack.

All stack based language that I'm aware of that allow you to capture stack traces need to perform these operations.

When raise is called collect the stack information. Note, Java 1.7 allows you to suppress stack collection and it's a lot faster but you lose a lot of useful information. There's no sensible way for the language to know who will catch it so ignoring an exception does not help because it has to perform the bulk of the work anyway.
If we're raising an exception then Unwind the stack ie deallocate all the memory and unwind back until we hit a valid catch.

The catch is minuscule in comparison to the above two operations. Here's some code to demonstrate that as the stack depth increases the performance goes down.

#!/usr/bin/env python
import os
import re
import time
import pytest

max_depth = 10
time_start = [0] * (max_depth + 1)
time_stop  = [0] * (max_depth + 1)
time_total = [0] * (max_depth + 1)
depth = []
for x in range(0, max_depth):
  depth.append(x)

@pytest.mark.parametrize('i', depth)
def test_stack(benchmark, i):
  benchmark.pedantic(catcher2, args=(i,i), rounds=10, iterations=1000)

#@pytest.mark.parametrize('d', depth)
#def test_recursion(benchmark, d):
#  benchmark.pedantic(catcher, args=(d,), rounds=50, iterations=50) 

def catcher(i, depth):
  try:
    ping(i, depth)
  except Exception:
    time_total[depth] += time.clock() - time_start[depth]

def recurse(i, depth):
  if(d > 0):
    recurse(--i, depth)
  thrower(depth)

def catcher2(i, depth):
  global time_total
  global time_start
  try:
    ping(i, depth)
  except Exception:
    time_total[depth] += time.clock() - time_start[depth]

def thrower(depth):
  global time_start
  time_start[depth] = time.clock()
  raise Exception('wtf')

def ping(i, depth):
  if(i < 1): thrower(i, depth)
  return pong(i, depth)

def pong(i, depth):
  if(i < 0): thrower(i,depth)
  return ping(i - 4, depth)

if __name__ == "__main__":
  rounds     = 200000
  class_time  = 0
  class_start = time.clock()
  for round in range(0, rounds):
    ex = Exception()
  class_time = time.clock() - class_start
  print("%d ex = Exception()'s %f" % (rounds, class_time))

  for depth in range(0, max_depth):
    #print("Depth %d" % depth)
    for round in range(0, rounds):
      catcher(depth, depth)

  for rep in range(0, max_depth):
    print("depth=%d time=%f" % (rep, time_total[rep]/1000000))

The output is, time (times are relative) take to call Exception()

200000 ex = Exception()'s 0.040469

depth=0 time=0.103843
depth=1 time=0.246050
depth=2 time=0.401459
depth=3 time=0.565742
depth=4 time=0.736362
depth=5 time=0.921993
depth=6 time=1.102257
depth=7 time=1.278089
depth=8 time=1.463500
depth=9 time=1.657082

Someone better at Python than me might be able to get py.test to print the timings at the end.

Note, There was a very similar question to this asked about Java a few weeks ago. It's a very informative thread regardless of language used...

Which part of throwing an Exception is expensive?

Answer 4

A Python program is constructed from code blocks. A block is a piece of Python program text that is executed as a unit. In Python core block is represented as struct basicblock:

cpython/Python/compile.c

typedef struct basicblock_ {
    /* Each basicblock in a compilation unit is linked via b_list in the
       reverse order that the block are allocated.  b_list points to the next
       block, not to be confused with b_next, which is next by control flow. */
    struct basicblock_ *b_list;
    /* number of instructions used */
    int b_iused;
    /* length of instruction array (b_instr) */
    int b_ialloc;
    /* pointer to an array of instructions, initially NULL */
    struct instr *b_instr;
    /* If b_next is non-NULL, it is a pointer to the next
       block reached by normal control flow. */
    struct basicblock_ *b_next;
    /* b_seen is used to perform a DFS of basicblocks. */
    unsigned b_seen : 1;
    /* b_return is true if a RETURN_VALUE opcode is inserted. */
    unsigned b_return : 1;
    /* depth of stack upon entry of block, computed by stackdepth() */
    int b_startdepth;
    /* instruction offset for block, computed by assemble_jump_offsets() */
    int b_offset;
} basicblock;

Loops, try/except and try/finally statements handled something different. For this 3 statements are used frame block:

cpython/Python/compile.c

enum fblocktype { LOOP, EXCEPT, FINALLY_TRY, FINALLY_END };

struct fblockinfo {
    enum fblocktype fb_type;
    basicblock *fb_block;
};

A code block is executed in an execution frame.

cpython/Include/frameobject.h

typedef struct _frame {
    PyObject_VAR_HEAD
    struct _frame *f_back;      /* previous frame, or NULL */
    PyCodeObject *f_code;       /* code segment */
    PyObject *f_builtins;       /* builtin symbol table (PyDictObject) */
    PyObject *f_globals;        /* global symbol table (PyDictObject) */
    PyObject *f_locals;         /* local symbol table (any mapping) */
    PyObject **f_valuestack;    /* points after the last local */
    /* Next free slot in f_valuestack.  Frame creation sets to f_valuestack.
       Frame evaluation usually NULLs it, but a frame that yields sets it
       to the current stack top. */
    PyObject **f_stacktop;
    PyObject *f_trace;          /* Trace function */

    /* In a generator, we need to be able to swap between the exception
       state inside the generator and the exception state of the calling
       frame (which shouldn't be impacted when the generator "yields"
       from an except handler).
       These three fields exist exactly for that, and are unused for
       non-generator frames. See the save_exc_state and swap_exc_state
       functions in ceval.c for details of their use. */
    PyObject *f_exc_type, *f_exc_value, *f_exc_traceback;
    /* Borrowed reference to a generator, or NULL */
    PyObject *f_gen;

    int f_lasti;                /* Last instruction if called */
    /* Call PyFrame_GetLineNumber() instead of reading this field
       directly.  As of 2.3 f_lineno is only valid when tracing is
       active (i.e. when f_trace is set).  At other times we use
       PyCode_Addr2Line to calculate the line from the current
       bytecode index. */
    int f_lineno;               /* Current line number */
    int f_iblock;               /* index in f_blockstack */
    char f_executing;           /* whether the frame is still executing */
    PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */
    PyObject *f_localsplus[1];  /* locals+stack, dynamically sized */
} PyFrameObject;

A frame contains some administrative information (used for debugging) and determines where and how execution continues after the code block's execution has completed. When you use 'as' statement (in 'import something as' or 'except Exception as' statements) you simply do name binding operation. Ie Python simply add a reference to object in *f_locals symbol table of frame object. Thus no overhead at runtime will not be.

But you will have some overhead at parse time.

cpython/Modules/parsermodule.c

static int
validate_except_clause(node *tree)
{
    int nch = NCH(tree);
    int res = (validate_ntype(tree, except_clause)
               && ((nch == 1) || (nch == 2) || (nch == 4))
               && validate_name(CHILD(tree, 0), "except"));

    if (res && (nch > 1))
        res = validate_test(CHILD(tree, 1));
    if (res && (nch == 4))
        res = (validate_name(CHILD(tree, 2), "as")
               && validate_ntype(CHILD(tree, 3), NAME));

    return (res);
}

But, in my opinion, this can be neglected

Exceptions catching performance in python

Question

4 answers

solution1
13 ACCPTED 2016-04-20 08:41:27

solution2
6 2016-04-15 09:11:26

solution3
6 2016-04-22 04:15:14

solution4
3 2016-04-21 07:38:10

Exceptions catching performance in python

Question

4 answers

solution1 13 ACCPTED 2016-04-20 08:41:27

solution2 6 2016-04-15 09:11:26

solution3 6 2016-04-22 04:15:14

solution4 3 2016-04-21 07:38:10

solution1
13 ACCPTED 2016-04-20 08:41:27

solution2
6 2016-04-15 09:11:26

solution3
6 2016-04-22 04:15:14

solution4
3 2016-04-21 07:38:10