简体   繁体   中英

Does Slicing `a` (e.g. `a[1:] == a[:-1]`) create copies of the `a`?

A friend of mine showed me the following Python code:

a[1:] == a[:-1]

Which returns True iff all the items in a are identical.

I argued the code is hard to understand from first sight, and moreover - it is inefficient in memory usage, because two copies of a will be created for the comparison.

I've used Python's dis to see what happens behind the hood at a[1:]==a[:-1] :

>>> def stanga_compare(a):
...     return a[1:]==a[:-1]
...
>>> a=range(10)
>>> stanga_compare(a)
False
>>> a=[0 for i in range(10)]
>>> stanga_compare(a)
True

>>> dis.dis(stanga_compare)
  2           0 LOAD_FAST                0 (a)
              3 LOAD_CONST               1 (1)
              6 SLICE+1
              7 LOAD_FAST                0 (a)
             10 LOAD_CONST               2 (-1)
             13 SLICE+2
             14 COMPARE_OP               2 (==)
             17 RETURN_VALUE

It boils down to two slicing commands - SLICE+1 and SLICE+2 . The documentation is unclear as to whether these opcodes actually create a new copy of a , or just a reference to it.

  • Does the SLICE commands copy a ?
  • Does the answer varies between Python implementations (Cython, Jython)?

Update

This snippet is clearly unreadable and confusing, and I'm not going to ever use it in actual code. My interest is purely technical - whether slicing copies the list, and whether the answer varies under different circumstances.

The documentation is unclear because slicing different objects does different things. In the case of a list, slicing does make a (shallow) copy 1 . Note that this is a feature of python lists independent of python implementation . In the case of other objects (like numpy arrays), it might not create a copy.

If you want a better way to check that all the elements in a list are the same, I would probably recommend:

 all(lst[0] == item for item in lst)

From a performance standpoint, your friend's code might actually outperform this for small lists since list slicing is so optimized. But, this is IMHO much easier to tell what is going on and has the opportunity to "short circuit" as soon as it finds a non-match.

1 The actual function to look at is list_subscript , but for most cases, it just calls list_slice

If a is a list or tuple or string, len(a) is n , and n > 0 , then each slice creates (at C level) a new array of length n-1 . At C level, all objects in CPython are implemented as pointers, so these new arrays contain n-1 pointers copied from a (well, not really for strings - the string representation is more frugal).

But, as @mgilson said, what slicing returns is up to a 's type. Some types may return a compact descriptor instead of copying anything. And a type may even implement slicing in such a way that the code shown wouldn't work as intended.

But you really had a list in mind ;-)

Yes, with list objects Python creates shallow copies when slicing, however the loop is made in C (for cpython) and for that reason is going to be much faster than anything you can write in Python for doing the same. Looping two times in C for the shallow copy and looping again for comparison is going to be faster than just looping in Python once.

Please remember than cpython is quite often fast enough, but that Python code is still about 100 times slower than C code. Leaving cpython doing the loops for you is therefore better in general if you want a bit of speed. Note that even things like c = a + b in Python mean the execution of a lot of logic (including branches and memory allocations).

On the other side however if for your code this kind of micro optimization is fundamental then probably Python is not the right tool for the problem you are fighting and you should consider other options (like writing a small C++ extension interfaced with sip, using Cython, PyPy ...).

For sure that code is not readable for the untrained eye and in case the list is long and often not constant then all(y == x[0] for y in x) is going to be faster because of short circuiting (even if the loop is in Python and an extra subscript operation is done for every element).

Readability counts. A lot.

EDIT

Another interesting option for having C code looping over the elements is

x and x.count(x[0]) == len(x)

This doesn't offer short-circuiting, but on my pc is about 75 times faster than the all -based solution for a list for 1000 elements all of them equal and about 6 times faster than x[1:] == x[:-1] .

I find it also a bit more readable than x[1:] == x[:-1] , but it's probably a matter of taste.

For vanilla lists, slicing does create a copy. You can prevent the copy using iteration instead:

import itertools
a1 = iter(a)
a2 = iter(a)
a2.next() # start a2 iterator one removed
all_are_identical = all((i1 == i2 for i1, i2 in itertools.izip(a1, a2)))

The line (i1 == i2 for i1, i2 in itertools.izip(a1, a2)) creates a generator that will return whether each element in a is equal to the next, one at a time, to all . The results are evaluated one by one, instead of being put in a list first, so you save memory at the cost of some performance.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM