简体   繁体   中英

Python list elements vs converting to tuple for string formatting

I came across a problem on codewars and am not sure what the difference is between these two possible solutions, one converting a list into a tuple and one specifying elements of the input list.

Problem: convert a list of names (strings) to a statement similar to what Facebook uses to display likes: "Alex likes this", "Alex and John like this", "Alex, John and 2 others like this", etc.

Using a if-elif-etc statement, this is pretty trivial:

    if len(names) == 0:
        output_string = "no one likes this"
    elif len(names) == 1:
        output_string = str(names[0]) + " likes this"

But in the longer lists of names, you have a choice:

    elif len(names) == 2:
        output_string = "%s and %s like this" % (names[0], names[1])

OR

    elif len(names) == 3:
        output_string = "%s, %s and %s like this" % tuple(names)

My hypothesis is that it's more computationally efficient to use names[0] etc, because you don't create a new object in memory for the tuple - is that right?

CPython optimization rules are usually based around how much work you push to the C layer (vs. the bytecode interpreter) and how complex the bytecode instructions are; for low levels of absolute work, the fixed overhead of the interpreter tends to swamp the real work, so intuition derived from experience in lower-level languages just doesn't apply.

It's pretty easy to test though, especially with ipython 's %timeit magic (timings done on Python 3.8.5 on Alpine Linux running under WSLv2):

In [2]: %%timeit l = [1, 2, 3]
   ...: tuple(l)
97.6 ns ± 0.303 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [3]: %%timeit l = [1, 2, 3]
   ...: (l[0], l[1], l[2])
104 ns ± 0.561 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [4]: %%timeit l = [1, 2, 3]
   ...: (*l,)
78.1 ns ± 0.628 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [5]: %%timeit l = [1, 2]
   ...: tuple(l)
96 ns ± 0.895 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [6]: %%timeit l = [1, 2]
   ...: (l[0], l[1])
70.1 ns ± 0.571 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [7]: %%timeit l = [1, 2]
   ...: (*l,)
73.4 ns ± 0.736 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

So in fact, the code example you gave made the correct decision for each size (assuming performance is all that counts); at two elements, indexing is faster than the alternatives, at three, converting to tuple in bulk saves enough over repeated indexing to win.

Just for fun, I included an equivalent solution to tuple(l) up there that using the additional unpacking generalizations to build the tuple using dedicated bytecodes, which shows how something as small as replacing a generalized constructor call with dedicated optimized bytecode can make a surprisingly amount of difference in the fixed overhead.

What's extra fun about this example: The faster (*l,) solution actually involves two temporaries; BUILD_TUPLE_UNPACK (the byte code that implements it) shares a code path with BUILD_LIST_UNPACK . Both of them actually build a list , and BUILD_TUPLE_UNPACK just converts it to tuple at the end. So (*l,) is hiding yet another copy to a temporary data structure, but because the specialized bytecode is so much more efficient than built-in lookup plus general purpose constructor code paths, it still wins.

Let's use the disassembler to see what bytecode Python generates for this:

>>> names=['alex', 'ramon', 'carla']
>>> from dis import dis
>>> dis('abc')
  1           0 LOAD_NAME                0 (abc)
              2 RETURN_VALUE
>>> dis('"%s, %s and %s like this" % tuple(names)')
  1           0 LOAD_CONST               0 ('%s, %s and %s like this')
              2 LOAD_NAME                0 (tuple)
              4 LOAD_NAME                1 (names)
              6 CALL_FUNCTION            1
              8 BINARY_MODULO
             10 RETURN_VALUE
>>> dis('"%s, %s and %s like this" % (names[0], names[1], names[2])')
  1           0 LOAD_CONST               0 ('%s, %s and %s like this')
              2 LOAD_NAME                0 (names)
              4 LOAD_CONST               1 (0)
              6 BINARY_SUBSCR
              8 LOAD_NAME                0 (names)
             10 LOAD_CONST               2 (1)
             12 BINARY_SUBSCR
             14 LOAD_NAME                0 (names)
             16 LOAD_CONST               3 (2)
             18 BINARY_SUBSCR
             20 BUILD_TUPLE              3
             22 BINARY_MODULO
             24 RETURN_VALUE

The fact that the disassembler shows there are a lot more instructions for the second approach doesn't necessarily mean it's slower. After all, a function call is just the opaque CALL_FUNCTION . So you have to use judgement and know what that's doing. But it seems you're building a tuple either way…

Take a look of your code visualization (shows memory units used by your code)

代码的可视化

In your code, the type you stored in output_string is a string. Even though you wrote tuple(names) in memory there will be no memory allocation to a tuple. Consider the above visualization of your code in memory terms.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM