简体   繁体   中英

insertion sort worst time complexity for near sorted array?

I have an n-element array. All elements except 4√n of them are sorted. We do not know the positions of these misplaced elements. What is the most efficient way of sorting this list?

Is there an O(n) way to do this?

Update 1:

time complexity of an​ insertion sort is O(n) for almost sorted data (is it true in worst case?)?

There is a fast general method for sorting almost sorted arrays:

  1. Scan through the original array from start to end. If you find two items that are not ordered correctly, move them to a second array and remove them from the first array. Be careful; for example if you remove x2 and x3, then you need to check again that x1 ≤ x2. This is done in O(n) time. In your case, the new array is at most 8sqrt(n) in size.

  2. Sort the second array, then merge both arrays. With the small number of items in the second array, any reasonable sorting algorithm will sort the small second array in O(n), and the merge takes O(n) again, so the total time is O(n).

If you use a O(n log n) algorithm to sort the second array, then sorting is O(n) as long as the number of items in the wrong position is at most O (n / log n).

No, insertion sort isn't O(n) on that. Worst case is when it's the last 4√n elements that are misplaced, and they're so small that they belong at the front of the array. It'll take insertion sort Θ(n √n) to move them there.

Here's a Python implementation of gnasher729's answer that's O(n) time and O(n) space on such near-sorted inputs. We can't naively "remove" pairs from the array, though, that would be inefficient. Instead, I move correctly sorted values into a good list and the misordered pairs into a bad list. So as long as the numbers are increasing, they're just added to good . But if the next number x is smaller than the last good number good[-1] , then they're both moved to bad . When I'm done, I concatenate good and bad and let Python's Timsort do the rest. It detects the already sorted run good in O(n - √n) time, then sorts the bad part in O(√n log √n) time, and finally merges the two sorted parts in O(n) time.

def sort1(a):
    good, bad = [], []
    for x in a:
        if good and x < good[-1]:
            bad += x, good.pop()
        else:
            good += x,
    a[:] = sorted(good + bad)

Next is a space-improved version that takes O(n) time and only O(√n) space. Instead of storing the good part in an extra list, I store it in a[:good] :

def sort2(a):
    good, bad = 0, []
    for x in a:
        if good and x < a[good-1]:
            bad += x, a[good-1]
            good -= 1
        else:
            a[good] = x
            good += 1
    a[good:] = bad
    a.sort()

And here's another O(n) time and O(√n) space variation where I let Python sort bad for me, but then merge the good part with the bad part myself, from right to left. So this doesn't rely on Timsort's sorted-run detection and is thus easily ported to other languages:

def sort3(a):
    good, bad = 0, []
    for x in a:
        if good and x < a[good-1]:
            bad += x, a[good-1]
            good -= 1
        else:
            a[good] = x
            good += 1
    bad.sort()
    i = len(a)
    while bad:
        i -= 1
        if good and a[good-1] > bad[-1]:
            good -= 1
            a[i] = a[good]
        else:
            a[i] = bad.pop()

Finally, some test code:

from random import random, sample
from math import isqrt

def sort1(a):
    ...

def sort2(a):
    ...

def sort3(a):
    ...

def fake(a):
    """Intentionally do nothing, to show that the test works."""

def main():
    n = 10**6
    a = [random() for _ in range(n)]
    a.sort()
    for i in sample(range(n), 4 * isqrt(n)):
        a[i] = random()

    for sort in sort1, sort2, sort3, fake:
        copy = a.copy()
        sort(copy)
        print(sort.__name__, copy == sorted(a))

if __name__ == '__main__':
    main()

Output, shows that both solutions passed the test (and that the test works, detecting fake as incorrect):

sort1 True
sort2 True
sort3 True
fake False

Fun fact: For Timsort alone (ie, not used as part of the above algorithms), the worst case I mentioned above is rather a best case: It would sort that in O(n) time. Just like in my first version's sorted(good + bad) , it'd recognize the prefix of n-√n sorted elements in O(n - √n) time, sort the √n last elements in O(√n log √n) time, and then merge the two sorted parts in O(n) time.

So can we just let Timsort do the whole thing? Is it O(n) on all such near-sorted inputs? No, it's not. If the 4√n misplaced elements are evenly spread over the array, then we have up to 4√n sorted runs and Timsort will take O(n log(4√n)) = O(n log n) time to merge them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM