I came across a problem on codewars and am not sure what the difference is between these two possible solutions, one converting a list into a tuple and one specifying elements of the input list.
Problem: convert a list of names (strings) to a statement similar to what Facebook uses to display likes: "Alex likes this", "Alex and John like this", "Alex, John and 2 others like this", etc.
Using a if-elif-etc statement, this is pretty trivial:
if len(names) == 0:
output_string = "no one likes this"
elif len(names) == 1:
output_string = str(names[0]) + " likes this"
But in the longer lists of names, you have a choice:
elif len(names) == 2:
output_string = "%s and %s like this" % (names[0], names[1])
OR
elif len(names) == 3:
output_string = "%s, %s and %s like this" % tuple(names)
My hypothesis is that it's more computationally efficient to use names[0]
etc, because you don't create a new object in memory for the tuple - is that right?
CPython optimization rules are usually based around how much work you push to the C layer (vs. the bytecode interpreter) and how complex the bytecode instructions are; for low levels of absolute work, the fixed overhead of the interpreter tends to swamp the real work, so intuition derived from experience in lower-level languages just doesn't apply.
It's pretty easy to test though, especially with ipython
's %timeit
magic (timings done on Python 3.8.5 on Alpine Linux running under WSLv2):
In [2]: %%timeit l = [1, 2, 3]
...: tuple(l)
97.6 ns ± 0.303 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [3]: %%timeit l = [1, 2, 3]
...: (l[0], l[1], l[2])
104 ns ± 0.561 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [4]: %%timeit l = [1, 2, 3]
...: (*l,)
78.1 ns ± 0.628 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [5]: %%timeit l = [1, 2]
...: tuple(l)
96 ns ± 0.895 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [6]: %%timeit l = [1, 2]
...: (l[0], l[1])
70.1 ns ± 0.571 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [7]: %%timeit l = [1, 2]
...: (*l,)
73.4 ns ± 0.736 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
So in fact, the code example you gave made the correct decision for each size (assuming performance is all that counts); at two elements, indexing is faster than the alternatives, at three, converting to tuple
in bulk saves enough over repeated indexing to win.
Just for fun, I included an equivalent solution to tuple(l)
up there that using the additional unpacking generalizations to build the tuple
using dedicated bytecodes, which shows how something as small as replacing a generalized constructor call with dedicated optimized bytecode can make a surprisingly amount of difference in the fixed overhead.
What's extra fun about this example: The faster (*l,)
solution actually involves two temporaries; BUILD_TUPLE_UNPACK
(the byte code that implements it) shares a code path with BUILD_LIST_UNPACK
. Both of them actually build a list
, and BUILD_TUPLE_UNPACK
just converts it to tuple
at the end. So (*l,)
is hiding yet another copy to a temporary data structure, but because the specialized bytecode is so much more efficient than built-in lookup plus general purpose constructor code paths, it still wins.
Let's use the disassembler to see what bytecode Python generates for this:
>>> names=['alex', 'ramon', 'carla']
>>> from dis import dis
>>> dis('abc')
1 0 LOAD_NAME 0 (abc)
2 RETURN_VALUE
>>> dis('"%s, %s and %s like this" % tuple(names)')
1 0 LOAD_CONST 0 ('%s, %s and %s like this')
2 LOAD_NAME 0 (tuple)
4 LOAD_NAME 1 (names)
6 CALL_FUNCTION 1
8 BINARY_MODULO
10 RETURN_VALUE
>>> dis('"%s, %s and %s like this" % (names[0], names[1], names[2])')
1 0 LOAD_CONST 0 ('%s, %s and %s like this')
2 LOAD_NAME 0 (names)
4 LOAD_CONST 1 (0)
6 BINARY_SUBSCR
8 LOAD_NAME 0 (names)
10 LOAD_CONST 2 (1)
12 BINARY_SUBSCR
14 LOAD_NAME 0 (names)
16 LOAD_CONST 3 (2)
18 BINARY_SUBSCR
20 BUILD_TUPLE 3
22 BINARY_MODULO
24 RETURN_VALUE
The fact that the disassembler shows there are a lot more instructions for the second approach doesn't necessarily mean it's slower. After all, a function call is just the opaque CALL_FUNCTION
. So you have to use judgement and know what that's doing. But it seems you're building a tuple either way…
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.