简体   繁体   English

近排序数组的插入排序最坏时间复杂度?

[英]insertion sort worst time complexity for near sorted array?

I have an n-element array.我有一个 n 元素数组。 All elements except 4√n of them are sorted.4√n之外的所有元素都被排序。 We do not know the positions of these misplaced elements.我们不知道这些错位元素的位置。 What is the most efficient way of sorting this list?对此列表进行排序的最有效方法是什么?

Is there an O(n) way to do this?有没有 O(n) 的方法来做到这一点?

Update 1:更新 1:

time complexity of an​ insertion sort is O(n) for almost sorted data (is it true in worst case?)?对于几乎排序的数据,插入排序的时间复杂度是 O(n)(在最坏的情况下是真的吗?)?

There is a fast general method for sorting almost sorted arrays:有一种对几乎已排序的数组进行排序的快速通用方法:

  1. Scan through the original array from start to end.从头到尾扫描原始数组。 If you find two items that are not ordered correctly, move them to a second array and remove them from the first array.如果您发现两个项目的顺序不正确,请将它们移动到第二个数组并从第一个数组中删除它们。 Be careful;当心; for example if you remove x2 and x3, then you need to check again that x1 ≤ x2.例如,如果删除 x2 和 x3,则需要再次检查 x1 ≤ x2。 This is done in O(n) time.这是在 O(n) 时间内完成的。 In your case, the new array is at most 8sqrt(n) in size.在您的情况下,新数组的大小最多为 8sqrt(n)。

  2. Sort the second array, then merge both arrays.对第二个数组进行排序,然后合并两个数组。 With the small number of items in the second array, any reasonable sorting algorithm will sort the small second array in O(n), and the merge takes O(n) again, so the total time is O(n).由于第二个数组的项数较少,任何合理的排序算法都会在O(n)中对较小的第二个数组进行排序,合并又需要O(n),所以总时间为O(n)。

If you use a O(n log n) algorithm to sort the second array, then sorting is O(n) as long as the number of items in the wrong position is at most O (n / log n).如果使用 O(n log n) 算法对第二个数组进行排序,那么只要错误位置的项数最多为 O(n / log n),排序就是 O(n)。

No, insertion sort isn't O(n) on that.不,插入排序不是 O(n)。 Worst case is when it's the last 4√n elements that are misplaced, and they're so small that they belong at the front of the array.最坏的情况是最后4√n 个元素错位了,而且它们太小以至于它们位于数组的前面 It'll take insertion sort Θ(n √n) to move them there.需要插入排序 Θ(n √n) 才能将它们移到那里。

Here's a Python implementation of gnasher729's answer that's O(n) time and O(n) space on such near-sorted inputs.这是gnasher729 答案的 Python 实现,在这种接近排序的输入上是 O(n) 时间和 O(n) 空间。 We can't naively "remove" pairs from the array, though, that would be inefficient.但是,我们不能天真地从数组中“删除”对,这将是低效的。 Instead, I move correctly sorted values into a good list and the misordered pairs into a bad list.相反,我将正确排序的值移动到一个good列表中,将错误排序的对移动到一个bad列表中。 So as long as the numbers are increasing, they're just added to good .所以只要数字在增加,它们就会被添加到good But if the next number x is smaller than the last good number good[-1] , then they're both moved to bad .但是如果下一个数字x小于最后一个好数字good[-1] ,那么它们都被移动到bad When I'm done, I concatenate good and bad and let Python's Timsort do the rest.完成后,我将goodbad连接起来,然后让 Python 的Timsort完成剩下的工作。 It detects the already sorted run good in O(n - √n) time, then sorts the bad part in O(√n log √n) time, and finally merges the two sorted parts in O(n) time.它在 O(n - √n) 时间内检测已经排序good运行,然后在 O(√n log √n) 时间内对bad部分进行排序,最后在 O(n) 时间内合并两个已排序部分。

def sort1(a):
    good, bad = [], []
    for x in a:
        if good and x < good[-1]:
            bad += x, good.pop()
        else:
            good += x,
    a[:] = sorted(good + bad)

Next is a space-improved version that takes O(n) time and only O(√n) space.接下来是空间改进版本,需要 O(n) 时间和 O(√n) 空间。 Instead of storing the good part in an extra list, I store it in a[:good] :我没有将好的部分存储在额外的列表中,而是将其存储在a[:good]

def sort2(a):
    good, bad = 0, []
    for x in a:
        if good and x < a[good-1]:
            bad += x, a[good-1]
            good -= 1
        else:
            a[good] = x
            good += 1
    a[good:] = bad
    a.sort()

And here's another O(n) time and O(√n) space variation where I let Python sort bad for me, but then merge the good part with the bad part myself, from right to left.这是另一个 O(n) 时间和 O(√n) 空间变化,我让 Python 对我bad ,但我自己将好的部分与坏的部分从右到左合并。 So this doesn't rely on Timsort's sorted-run detection and is thus easily ported to other languages:所以这不依赖于 Timsort 的排序运行检测,因此很容易移植到其他语言:

def sort3(a):
    good, bad = 0, []
    for x in a:
        if good and x < a[good-1]:
            bad += x, a[good-1]
            good -= 1
        else:
            a[good] = x
            good += 1
    bad.sort()
    i = len(a)
    while bad:
        i -= 1
        if good and a[good-1] > bad[-1]:
            good -= 1
            a[i] = a[good]
        else:
            a[i] = bad.pop()

Finally, some test code:最后,一些测试代码:

from random import random, sample
from math import isqrt

def sort1(a):
    ...

def sort2(a):
    ...

def sort3(a):
    ...

def fake(a):
    """Intentionally do nothing, to show that the test works."""

def main():
    n = 10**6
    a = [random() for _ in range(n)]
    a.sort()
    for i in sample(range(n), 4 * isqrt(n)):
        a[i] = random()

    for sort in sort1, sort2, sort3, fake:
        copy = a.copy()
        sort(copy)
        print(sort.__name__, copy == sorted(a))

if __name__ == '__main__':
    main()

Output, shows that both solutions passed the test (and that the test works, detecting fake as incorrect):输出,显示两个解决方案都通过了测试(并且测试有效,检测到fake的不正确):

sort1 True
sort2 True
sort3 True
fake False

Fun fact: For Timsort alone (ie, not used as part of the above algorithms), the worst case I mentioned above is rather a best case: It would sort that in O(n) time.有趣的事实:对于 Timsort 单独(即,不用作上述算法的一部分),我上面提到的最坏情况是最好的情况:它会在 O(n) 时间内对其进行排序。 Just like in my first version's sorted(good + bad) , it'd recognize the prefix of n-√n sorted elements in O(n - √n) time, sort the √n last elements in O(√n log √n) time, and then merge the two sorted parts in O(n) time.就像我第一个版本的sorted(good + bad) ,它会在 O(n - √n) 时间内识别 n-√n 个已排序元素的前缀,对 O(√n log √n) 中的 √n 最后一个元素进行排序) 时间,然后在 O(n) 时间内合并两个已排序的部分。

So can we just let Timsort do the whole thing?那么我们可以让 Timsort 做所有事情吗? Is it O(n) on all such near-sorted inputs?在所有这些接近排序的输入上都是 O(n) 吗? No, it's not.不,这不对。 If the 4√n misplaced elements are evenly spread over the array, then we have up to 4√n sorted runs and Timsort will take O(n log(4√n)) = O(n log n) time to merge them.如果 4√n 个错位元素均匀分布在数组上,那么我们最多有 4√n 次排序运行,Timsort 将花费 O(n log(4√n)) = O(n log n) 时间来合并它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM