简体   繁体   English

如何对主要排序的数组进行排序

[英]how to sort an array which is mostly sorted

i have an array like this: 我有一个像这样的数组:
1,2,3,5,6,4 it is 99% sorted and has 40K elements. 1,2,3,5,6,4它是99%排序并具有40K元素。
i can put them in an array, list, linked list, ... 我可以把它们放在一个数组,列表,链表,......
but i don`t know the fastest way to sort them! 但我不知道对它们进行排序的最快方法!

以下站点对常见的排序算法进行了比较 - 当集合几乎排序时,似乎插入排序获胜。

"They laughed when I sat down at the keyboard and coded a bubblesort..." “当我坐在键盘上编码一个气泡......时,他们笑了起来。”

But seriously: Bubblesort is close but not quite. 但严重的是:Bubblesort很接近,但并不完全。 Bubblesort repeatedly moves in one direction, so if there's a low-key value near the top end of the array it and the comparison site "bubbles" upward all the time, it takes many iterations of the main loop for the data item to bubble down against the current. Bubblesort反复向一个方向移动,因此如果阵列顶端附近有一个低调值,并且比较站点始终向上“冒泡”,则需要对主循环进行多次迭代才能使数据项向下冒泡与当前相反。 That's pretty much worst case behavior, which for Bubblesort is disastrous. 这是最糟糕的情况,对于Bubblesort来说是灾难性的。

But there's a refinement to BubbleSort, sometimes called Elevator Cocktail Sort, where the bubble moves in alternating directions: One pass up, one pass down, repeat. 但是BubbleSort有一个改进,有时候叫做Elevator Cocktail Sort,气泡在交替的方向上移动:一次向上,一次向下,重复。 This permits single elements to move a long distance in a single pass (or actually, 2 passes), and the number of passes is proportional to the number of elements that need moving. 这允许单个元素在单次通过中移动长距离(或实际上,2次通过),并且通过次数与需要移动的元素的数量成比例。 For a small number of unsorted elements, this can approach efficiency. 对于少量未分类的元素,这可以提高效率。


I believe that for the general case, the second link in marek's answer will be faster. 我相信,对于一般情况,marek答案中的第二个链接会更快。 The advantage of Bubble/ Elevator Cocktail sort is that it's so simple, it's virtually foolproof, and not a lot of work. Bubble / Elevator Cocktail排序的优势在于它非常简单,几乎万无一失,而且工作量很大。

如果它已经下令这样一个高度,而不是-相当排序元素不是远离自己的正确位置,那么这可能是使少数情况下一个冒泡排序是非常有用的。

谷歌为此提供了很多结果 ,例如,本文概述了如何实现这一目标的各种方法: http//home.tiac.net/~cri/2004/ksort.html

Put them in an array. 把它们放在一个数组中。 You don't want to mess with a 40k linked-list. 你不想搞乱一个40k的链表。

There is a very narrow case for CocktailSort (bubblesort in 2 directions). CocktailSort有一个非常狭窄的情况(两个方向的bubblesort)。 But that depends on what exactly that 1% unsorted means. 但这取决于1%未分类的含义。 If there are a few elements displaced, but close to their target positions it might work. 如果有少数元素被取代,但接近目标位置则可能有效。

But InsertionSort or ShellSort are almost always going to win. InsertionSortShellSort几乎总是会赢。 Even in the cases where CocktailSort would theoretically be better, the difference will be small. 即使在CocktailSort理论上会更好的情况下,差异也会很小。 So they are the (much) safer bet. 所以他们是(更)更安全的赌注。

As with most questions of this sort, the answer is "That depends...". 与大多数此类问题一样,答案是“这取决于......”。 Do you care if the sort is stable, ie if elements whose keys are equal retain their original relative ordering after being sorted? 您是否关心排序是否稳定,即如果键相等的元素在排序后保留其原始相对排序? Do you just care about raw speed? 你只关心原始速度吗? Is simplicity of implementation important? 简单的实施是否重要? Does memory consumption matter? 记忆消耗是否重要?

Personally, I will always choose a stable sort algorithm, as I'm willing to sacrifice some raw speed for what I consider to be "reasonable" behavior, and non-stable sorting is too often "unreasonable". 就个人而言,我总是选择一种稳定的排序算法,因为我愿意为我认为“合理”的行为牺牲一些原始速度,而非稳定的排序往往是“不合理的”。 So I tend to go with the merge-sort algorithm, which is fast and reasonably simple, but it does use extra memory. 所以我倾向于使用合并排序算法,这种算法快速且相当简单,但它确实使用了额外的内存。 Another advantage of merge-sort is that if the data is already sorted its complexity is O(n), so for nearly-sorted data it should be close to O(n). 合并排序的另一个优点是,如果数据已经排序,其复杂度为O(n),因此对于接近排序的数据,它应该接近O(n)。

YMMV. 因人而异。

Is performance critical (as verified by a profiler)? 性能是否至关重要(由分析器验证)? Otherwise, just use your framework/langauge's default sort (probably quicksort). 否则,只需使用您的framework / langauge的默认排序(可能是quicksort)。 It will perform decently. 它将表现得体面。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM