简体   繁体   中英

python array time complexity?

What's the .append time complexity of array.array and np.array ?

I see time complexity for list , collections.deque , set , and dict in python_wiki , but I can't find the time complexity of array.array and np.array . Where can I find them?

So to link you provided (also a TLDR ) list are internally "represented as an array" link It's supposed to be O(1) with a note at the bottom saying:

"These operations rely on the "Amortized" part of "Amortized Worst Case". Individual actions may take surprisingly long, depending on the history of the container." link


More details

It doesn't go into detail in the docs but if you look at the source code you'll actually see what's going on. Python array s have internal buffer(s) that allow for quick resizing of themselves and will realloc as it grows/shrinks.

array.append uses arraymodule.array_array_append which calls arraymodule.ins calling arraymodule.ins1 which is the meat and potatoes of the operation. Incidentally array.extend uses this as well but it just supplies it Py_SIZE(self) as the insertion index.

So if we read the notes in arraymodule.ins1 it starts off with:

 Bypass realloc() when a previous overallocation is large enough to accommodate the newsize. If the newsize is 16 smaller than the current size, then proceed with the realloc() to shrink the array.

link

...

 This over-allocates proportional to the array size, making room for additional growth. The over-allocation is mild, but is enough to give linear-time amortized behavior over a long sequence of appends() in the presence of a poorly-performing system realloc(). The growth pattern is: 0, 4, 8, 16, 25, 34, 46, 56, 67, 79, ... Note, the pattern starts out the same as for lists but then grows at a smaller rate so that larger arrays only overallocate by about 1/16th -- this is done because arrays are presumed to be more memory critical.

link

It is important to understand the array data structure to answer your question. Since both array objects are based on C arrays (regular , numpy ), they share a lot of the same functionality.

Adding an item to an array is amortized O(1) , but in most cases, ends up being O(n) time. This is because it could be the case that your array object is not filled yet, and thus appending some data to that spot in memory is a relatively trivial exercise, it is O(1) . However, in most cases, the array is full and thus needs to be completely copied over in memory with the new item added to it. This is a very expensive operation since an array of n size needs to be copied, thus making the insertion O(n) .

An interesting example from this post:

To make this clearer, consider the case where the factor is 2 and initial array size is 1 . Then consider copy costs to grow the array from size 1 to where it's large enough to hold, 2^k+1 elements for any k >= 0 . This size is 2^(k+1) . Total copy costs will include all the copying to become that big in factor-of-2 steps:

1 + 2 + 4 + ... + 2^k = 2^(k+1) - 1 = 2n - 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM