简体   繁体   中英

Performance of get a specific character of a string in Python 2.7

Suppose I want to get a specific character of a string in Python 2.7, suppose

a = 'abcdefg...' # a long string
print a[5]

Wondering when access any specific character of a string, for example, access the 5th element, wondering what is the performance, is it constant time O(1), or linear performance O(n) either according the 5 (the position of the character we are accessing), or linear performance O(n) to the whole string (len(a) in this example)?

>>> long_string_1M ="".join(random.choice(string.printable) for _ in xrange(1000000))
>>> short_string = "hello"
>>> timeit.timeit(lambda:long_string_1M[50000])
0.1487280547441503
>>> timeit.timeit(lambda:short_string[4])
0.1368805315209798
>>> timeit.timeit(lambda:short_string[random.randint(0,4)])
1.7327393072888242
>>> timeit.timeit(lambda:long_string_1M[random.randint(50000,100000)])
1.779330312345877

looks like O(1) to me

they acheive it because a string is consecutive memory locations so indexing into it is simply a matter of offsetting ... there is no seek (at least that is my understanding) if you know c/c++ its something like *(pointer+offset) (its been a long time since ive done C so that might be a little wrong)

In addition to Joran's answer, I'd point you to this reference implementation , confirming his answer that it is O(1) lookup

/* String slice a[i:j] consists of characters a[i] ... a[j-1] */        
static PyObject *    
string_slice(register PyStringObject *a, register Py_ssize_t i,    
             register Py_ssize_t j)    
     /* j -- may be negative! */    
{    
    if (i < 0)    
        i = 0;    
    if (j < 0)    
        j = 0; /* Avoid signed/unsigned bug in next line */    
    if (j > Py_SIZE(a))    
        j = Py_SIZE(a);    
    if (i == 0 && j == Py_SIZE(a) && PyString_CheckExact(a)) {    
        /* It's the same as a */    
        Py_INCREF(a);    
        return (PyObject *)a;    
    }    
    if (j < i)  
        j = i;    
    return PyString_FromStringAndSize(a->ob_sval + i, j-i);    
}

Why this should be your intuition

Python strings are immutable . This common optimization allows tricks like assuming contiguous data when desirable. Note that under the hood, we sometimes just need to compute the offset from the memory location in C (obviously implementation specific)

There are several places where the immutability of strings is something that can be relied on (or vexed by). In the python author's words;

There are several advantages [to strings being immutable]. One is performance: knowing that a string is immutable means we can allocate space for it at creation time

So although we may not be able to guarantee, as far as I know, this behaviour across implementations, it's awfully safe to assume.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM