简体   繁体   English

QuickSort和Hoare分区

[英]QuickSort and Hoare Partition

I have a hard time translating QuickSort with Hoare partitioning into C code, and can't find out why. 我很难将QuickSort与Hoare分区转换为C代码,但无法找到原因。 The code I'm using is shown below: 我正在使用的代码如下所示:

void QuickSort(int a[],int start,int end) {
    int q=HoarePartition(a,start,end);
    if (end<=start) return;
    QuickSort(a,q+1,end);
    QuickSort(a,start,q);
}

int HoarePartition (int a[],int p, int r) {
    int x=a[p],i=p-1,j=r;
    while (1) {
        do  j--; while (a[j] > x);
        do  i++; while (a[i] < x);

        if  (i < j)
            swap(&a[i],&a[j]);
        else
            return j;
    }
}

Also, I don't really get why HoarePartition works. 另外,我真的不明白为什么HoarePartition有效。 Can someone explain why it works, or at least link me to an article that does? 有人可以解释它为什么有效,或者至少把我链接到一篇文章吗?

I have seen a step-by-step work-through of the partitioning algorithm, but I don't have an intuitive feel for it. 我已经看到了分区算法的逐步完成,但我没有直观的感觉。 In my code, it doesn't even seem to work. 在我的代码中,它似乎甚至没有用。 For example, given the array 例如,给定数组

13 19  9  5 12  8  7  4 11  2  6 21

It will use pivot 13, but end up with the array 它将使用数据透视表13,但最终会使用数组

 6  2  9  5 12  8  7  4 11 19 13 21 

And will return j which is a[j] = 11 . 并且将返回j ,即a[j] = 11 I thought it was supposed to be true that the array starting at that point and going forward should have values that are all larger than the pivot, but that isn't true here because 11 < 13. 我认为从那个点开始并且前进的数组应该具有比枢轴更大的值,这应该是真的,但是这不是真的,因为11 <13。

Here's pseudocode for Hoare partitioning (from CLRS, second edition), in case this is useful: 这是Hoare分区的伪代码(来自CLRS,第二版),如果这很有用:

Hoare-Partition (A, p, r)
    x ← A[p]
    i ← p − 1
    j ← r + 1
    while  TRUE
        repeat   j ←  j − 1
            until     A[j] ≤ x
        repeat   i ←  i + 1
            until     A[i] ≥ x
        if  i < j
            exchange  A[i] ↔ A[j]
        else  return   j 

Thanks! 谢谢!

EDIT: 编辑:

The right C code for this problem will end up being: 这个问题的正确C代码将最终成为:

void QuickSort(int a[],int start,int end) {
    int q;
    if (end-start<2) return;
    q=HoarePartition(a,start,end);
    QuickSort(a,start,q);
    QuickSort(a,q,end);
}

int HoarePartition (int a[],int p, int r) {
    int x=a[p],i=p-1,j=r;
    while (1) {
        do  j--; while (a[j] > x);
        do  i++; while (a[i] < x);
        if  (i < j) 
            swap(&a[i],&a[j]);
        else 
            return j+1;
    }
}

To answer the question of "Why does Hoare partitioning work?": 回答“为什么Hoare分区工作?”的问题:

Let's simplify the values in the array to just three kinds: L values (those less than the pivot value), E values (those equal to the pivot value), and G value (those larger than the pivot value). 让我们将数组中的值简化为三种: L值(小于透视值的值), E值(等于透视值)和G值(大于透视值的值)。

We'll also give a special name to one location in the array; 我们还将为数组中的一个位置指定一个特殊名称; we'll call this location s , and it's the location where the j pointer is when the procedure finishes. 我们将这个位置称为s ,它是程序结束时j指针所在的位置。 Do we know ahead of time which location s is? 我们是否事先知道哪个位置s是? No, but we know that some location will meet that description. 不,但我们知道某个位置会符合该描述。

With these terms, we can express the goal of the partitioning procedure in slightly different terms: it is to split a single array into two smaller sub-arrays which are not mis-sorted with respect to each other. 使用这些术语,我们可以用稍微不同的术语表达分区过程的目标:它是将单个数组拆分成两个较小的子数组,这些子数组不会相互错误排序 That "not mis-sorted" requirement is satisfied if the following conditions are true: 如果满足以下条件,则满足“未错误排序”的要求:

  1. The "low" sub-array, that goes from the left end of the array up to and includes s , contains no G values. 从阵列的左端到包含s的“低”子阵列不包含G值。
  2. The "high" sub-array, that starts immediately after s and continues to the right end, contains no L values. “高”子数组在s之后立即开始并继续到右端,不包含L值。

That's really all we need to do. 这才是我们真正需要做的。 We don't even need to worry where the E values wind up on any given pass. 我们甚至不用担心E值在任何给定的传球中都会结束。 As long as each pass gets the sub-arrays right with respect to each other, later passes will take care of any disorder that exists inside any sub-array. 只要每次传递使子阵列相对于彼此正确,后来的传递将处理任何子阵列内存在的任何障碍。

So now let's address the question from the other side: how does the partitioning procedure ensure that there are no G values in s or to the left of it, and no L values to the right of s ? 所以,现在让我们来解决从对方的问题:如何划分程序保证有在S或以它的左边没有G值,并且不使用LS的吧?

Well, "the set of values to the right of s " is the same as "the set of cells the j pointer moves over before it reaches s ". 好吧,“ s右边的值集合”与“ j指针到达s之前移动的单元格集”相同。 And "the set of values to the left of and including s " is the same as "the set of values that the i pointer moves over before j reaches s ". 并且“包括s的左边的值集合”与“在j到达s之前i指针移动的值的集合”相同。

That means that any values which are misplaced will, on some iteration of the loop, be under one of our two pointers. 这意味着在循环的某些迭代中,任何放错位置的值将位于我们的两个指针中。 (For convenience, let's say it's the j pointer pointing at a L value, though it works exactly the same for the i pointer pointing at a G value.) Where will the i pointer be, when the j pointer is on a misplaced value? (为方便起见,我们说这是第j指针在L值指向,虽然它的工作原理完全为指针在G值指向相同的。)在的指针会在哪里,当第j指针是一个错位的价值? We know it will be: 我们知道它将是:

  1. at a location in the "low" subarray, where the L value can go with no problems; 在“低”子阵列中的某个位置, L值可以没有问题;
  2. pointing at a value that's either an E or a G value, which can easily replace the L value under the j pointer. 指向一个值为EG值的值,可以轻松替换j指针下的L值。 (If it wasn't on an E or a G value, it wouldn't have stopped there.) (如果它不是EG值,它就不会停在那里。)

Note that sometimes the i and j pointer will actually both stop on E values. 请注意,有时ij指针实际上都会停止在E值上。 When this happens, the values will be switched, even though there's no need for it. 发生这种情况时,即使不需要,也会切换值。 This doesn't do any harm, though; 但这并没有造成任何伤害; we said before that the placement of the E values can't cause mis-sorting between the sub-arrays. 我们之前说过, E值的放置不会导致子阵列之间的错误排序。

So, to sum up, Hoare partitioning works because: 总而言之,Hoare分区的工作原理是:

  1. It separates an array into smaller sub-arrays which are not mis-sorted relative to each other; 它将一个数组分成较小的子数组,这些子数组相对于彼此没有错误排序;
  2. If you keep doing that and recursively sorting the sub-arrays, eventually there will be nothing left of the array that's unsorted. 如果你继续这样做并递归地对子数组进行排序,那么最终将没有任何内容未被排序。

I believe that there are two problems with this code. 我相信这段代码存在两个问题。 For starters, in your Quicksort function, I think you want to reorder the lines 对于初学者来说,在你的Quicksort功能中,我想你想重新排序

 int q=HoarePartition(a,start,end);
 if (end<=start) return;

so that you have them like this: 所以你有这样的:

 if (end<=start) return;
 int q=HoarePartition(a,start,end);

However, you should do even more than this; 但是,你应该做的比这更多; in particular this should read 特别是这应该读

 if (end - start < 2) return;
 int q=HoarePartition(a,start,end);

The reason for this is that the Hoare partition fails to work correctly if the range you're trying to partition has size zero or one. 原因是如果您尝试分区的范围大小为零或一,则Hoare分区无法正常工作。 In my edition of CLRS this isn't mentioned anywhere; 在我的CLRS版本中,这里没有提到; I had to go to the book's errata page to find this. 我不得不去书的勘误页找到这个。 This is almost certainly the cause of the problem you encountered with the "access out of range" error, since with that invariant broken you might run right off the array! 这几乎可以肯定是“访问超出范围”错误所遇到的问题的原因,因为在不变的情况下,您可以直接从阵列运行!

As for an analysis of Hoare partitioning, I would suggest starting off by just tracing through it by hand. 至于Hoare分区的分析,我建议首先手动追踪它。 There's also a more detailed analysis here . 还有一个更详细的分析在这里 Intuitively, it works by growing two ranges from the ends of the range toward one another - one on the left-hand side containing elements smaller than the pivot and one on the right-hand side containing elements larger than the pivot. 直观地说,它通过从范围的两端向另一端增长两个范围来工作 - 一个在左侧包含小于枢轴的元素,一个在右侧包含比枢轴大的元素。 This can be slightly modified to produce the Bentley-McIlroy partitioning algorithm (referenced in the link) that scales nicely to handle equal keys. 这可以稍微修改以产生Bentley-McIlroy分区算法(在链接中引用),该算法可以很好地扩展以处理相等的密钥。

Hope this helps! 希望这可以帮助!

Your final code is wrong, since the initial value of j should be r + 1 instead of r . 你的最终代码是错误的,因为j的初始值应该是r + 1而不是r Otherwise your partition function always ignore the last value. 否则,您的分区函数始终忽略最后一个值。

Actually, HoarePartition works because for any array A[p...r] which contains at least 2 elements(ie p < r ), every element of A[p...j] is <= every element of A[j+1...r] when it terminates. 实际上,HoarePartition是有效的,因为对于包含至少2个元素(即p < r )的任何数组A[p...r]A[p...j]每个元素都是<= A[j+1...r]每个元素A[j+1...r]终止时。 So the next two segments that the main algorithm recurs on are [start...q] and [q+1...end] 所以主算法重复出现的下两个段是[start...q][q+1...end]

So the right C code is as follows: 所以正确的C代码如下:

void QuickSort(int a[],int start,int end) {
    if (end <= start) return;
    int q=HoarePartition(a,start,end);
    QuickSort(a,start,q);
    QuickSort(a,q + 1,end);
}

int HoarePartition (int a[],int p, int r) {
    int x=a[p],i=p-1,j=r+1;
    while (1) {
        do  j--; while (a[j] > x);
        do  i++; while (a[i] < x);
        if  (i < j) 
            swap(&a[i],&a[j]);
        else 
            return j;
    }
}

More clarifications: 更多说明:

  1. partition part is just the translation of the pseudocode. 分区部分只是伪代码的翻译。 (Note the return value is j ) (注意返回值是j

  2. for the recursive part, note that the base case checking ( end <= start instead of end <= start + 1 otherwise you will skip the [2 1] subarray ) 对于递归部分,请注意基本情况检查( end <= start而不是end <= start + 1否则您将跳过[2 1]子阵列)

You last C code works. 你最后的C代码是有效的。 But it's not intuitive. 但这并不直观。 And now I'm studying CLRS luckily. 现在我幸运地正在学习CLRS。 In my opinion, The pseudocode of CLRS is wrong.(At 2e) At last, I find that it would be right if changing a place. 在我看来,CLRS的伪代码是错误的。(在2e)最后,我发现改变一个地方是正确的。

 Hoare-Partition (A, p, r)
 x ← A[p]
     i ← p − 1
     j ← r + 1
 while  TRUE
        repeat   j ←  j − 1
            until     A[j] ≤ x
    repeat   i ←  i + 1
            until     A[i] ≥ x
    if  i < j
              exchange  A[i] ↔ A[j]
    else  
              exchnage  A[r] ↔ A[i]  
              return   i

Yes, Add a exchange A[r] ↔ A[i] can make it works. 是的,添加交换A [r]↔A [i]可以使其有效。 Why? 为什么? Because A[i] is now bigger than A[r] OR i == r. 因为A [i]现在大于A [r] OR i == r。 So We must exchange to guarantee the feature of a partition. 所以我们必须交换以保证分区的功能。

  1. move pivot to first. 将枢轴移至第一位。 (eg, use median of three. switch to insertion sort for small input size.) (例如,使用三个中值。切换到小输入大小的插入排序。)
  2. partition, 划分,
    • repetitively swap currently leftmost 1 with currently rightmost 0. 重复交换当前最左边的1与当前最右边的0。
      0 -- cmp(val, pivot) == true, 1 -- cmp(val, pivot) == false. 0 - cmp(val,pivot)== true,1 - cmp(val,pivot)== false。
      stop if not left < right. 如果没有离开<停止。
    • after that, swap pivot with rightmost 0. 之后,交换枢轴与最右边的0。

First of all u misunderstood the Hoare's partition algorithm,which can be see from the translated code in c, Since u considered pivot as leftmost element of subarray. 首先,你误解了Hoare的分区算法,可以从c中的翻译代码中看出,因为你认为枢轴是子阵列的最左边元素。

Ill explain u considering the leftmost element as pivot. 我会解释你将最左边的元素视为枢轴。

int HoarePartition (int a[],int p, int r) 

Here p and r represents the lower and upper bound of array which can be part of a larger array also(subarray) to be partitioned. 这里p和r表示数组的下限和上限,它也可以是要分区的较大数组(子阵列)的一部分。

so we start with the pointers(marker) initially pointing to before and after end points of array(simply bcoz using do while loop).Therefore, 所以我们从最初指向数组终点之前和之后的指针(标记)开始(简单地使用do while循环进行bcoz)。因此,

i=p-1,

j=r+1;    //here u made mistake

Now as per partitioning we want every element to the left of pivot to be less than or equal to pivot and greater than on right side of pivot. 现在按照分区我们希望枢轴左边的每个元素都小于或等于pivot,大于pivot的右侧。

So we will move 'i' marker untill we get element which is greaterthan or equal to pivot. 因此,我们将移动'i'标记,直到我们得到大于或等于枢轴的元素。 And similarly 'j' marker untill we find element less than or equal to pivot. 类似'j'标记,直到我们发现元素小于或等于pivot。

Now if i < j we swap the elements bcoz both the elements are in wrong part of array. 现在,如果我<j我们交换元素bcoz这两个元素都在数组的错误部分。 So code will be 所以代码将是

do  j--; while (a[j] <= x);                 //look at inequality sign
do  i++; while (a[i] >= x);
if  (i < j) 
    swap(&a[i],&a[j]);

Now if 'i' is not less than 'j',that means now there is no element in between to swap so we return 'j' position. 现在,如果'i'不小于'j',那意味着现在交换中没有元素,所以我们返回'j'位置。

So now the array after partitioned lower half is from 'start to j' 所以现在分区后半部分的数组是从'start to j'

upper half is from 'j+1 to end' 上半部分是从'j + 1到结尾'

so code will look like 所以代码看起来像

void QuickSort(int a[],int start,int end) {
    int q=HoarePartition(a,start,end);
    if (end<=start) return;
    QuickSort(a,start,q);
    QuickSort(a,q+1,end);
}

Straightforward implementation in java. 在Java中直接实现。

public class QuickSortWithHoarePartition {

    public static void sort(int[] array) {
        sortHelper(array, 0, array.length - 1);
    }

    private static void sortHelper(int[] array, int p, int r) {
        if (p < r) {
            int q = doHoarePartitioning(array, p, r);
            sortHelper(array, p, q);
            sortHelper(array, q + 1, r);
        }
    }

    private static int doHoarePartitioning(int[] array, int p, int r) {
        int pivot = array[p];
        int i = p - 1;
        int j = r + 1;

        while (true) {

            do {
                i++;
            }
            while (array[i] < pivot);

            do {
                j--;
            }
            while (array[j] > pivot);

            if (i < j) {
                swap(array, i, j);
            } else {
                return j;
            }
        }

    }

    private static void swap(int[] array, int i, int j) {
        int temp = array[i];
        array[i] = array[j];
        array[j] = temp;
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM