简体   繁体   English

无法使基数排序算法在C ++中工作

[英]Can't get the radix sort algorithm to work in C++

Given n 32 bit integers (assume that they are positive), you want to sort them by first looking at the most significant shift in total bits and recursively sorting each bucket that is created by the sorted integers on those bits. 给定n个32位整数(假设它们为正数),您想通过首先查看总位中最重要的shift并对它们进行递归排序的方式对它们进行排序,这些存储桶由这些位上的排序整数创建。

So if shift is 2, then you will first look at the two most significant bits in each 32 bit integer and then apply counting sort. 因此,如果shift为2,则您将首先查看每个32位整数中的两个最高有效位,然后应用计数排序。 Finally, from the groups that you will get, you recurse on each group and start sorting the numbers of each group by looking at the third and the fourth most significant bit. 最后,从将要获得的组中,对每个组进行递归,并通过查看第三和第四最高有效位来开始对每个组的编号进行排序。 You do this recursively. 您可以递归执行此操作。

My code is following: 我的代码如下:

void radix_sortMSD(int start, int end, 
          int shift, int currentDigit, int input[])
{

    if(end <= start+1 || currentDigit>=32) return;

    /*
     find total amount of buckets
     which is basically 2^(shift)
    */
    long long int numberOfBuckets = (1UL<<shift);

    /*
     initialize a temporary array 
     that will hold the sorted input array
     after finding the values of each bucket.   
    */

    int tmp[end];

   /*
     Allocate memory for the buckets.
   */
   int *buckets = new int[numberOfBuckets + 1];

   /*
       initialize the buckets,
        we don't care about what's 
     happening in position numberOfBuckets+1
   */
   for(int p=0;p<numberOfBuckets + 1;p++)
         buckets[p] = 0;

   //update the buckets
   for (int p = start; p < end; p++)
      buckets[((input[p] >> (32 - currentDigit - shift)) 
                &   (numberOfBuckets-1)) + 1]++;

   //find the accumulative sum
   for(int p = 1; p < numberOfBuckets + 1; p++)
       buckets[p] += buckets[p-1];

   //sort the input array input and store it in array tmp   
   for (int p = start; p < end; p++){ 
    tmp[buckets[((input[p] >> (32 - currentDigit- shift)) 
            & (numberOfBuckets-1))]++] = input[p];
    }

   //copy all the elements in array tmp to array input
   for(int p = start; p < end; p++)
          input[p] = tmp[p];

   //recurse on all the groups that have been created
   for(int p=0;p<numberOfBuckets;p++){
       radix_sortMSD(start+buckets[p], 
       start+buckets[p+1], shift, currentDigit+shift, input);
    }

    //free the memory of the buckets
    delete[] buckets;
}

  int main()
  {

        int a[] = {1, 3, 2, 1, 4, 8, 4, 3};
        int n = sizeof(a)/sizeof(int);
        radix_sortMSD(0,n, 2,0,a);
        return 0;
   }

I can imagine only two issues in this code. 我可以想象这段代码中只有两个问题。

First issue is whether or not I actually get the correct bits of the integers in every iteration. 第一个问题是我是否在每次迭代中都得到了正确的整数位。 I made the assumption that if I am in position currentDigit where if currentDigit = 0 it means that I am in bit 32 of my integer, then to get the next shift bits, I do a right shift by 32 - currentDigit - shift places and then I apply the AND operation to get the shift least most significant bits, which are exactly the bits that I want. 我假设如果我在currentDigit位置,如果currentDigit = 0则意味着我在我的整数的第32位,然后获取下一个shift位,我将右移32 - currentDigit - shift位置32 - currentDigit - shift ,然后我应用“与”运算来获得shift最低有效位,而这些正是我想要的位。

Second issue is in recursion. 第二个问题是递归。 I do not think that I recurse on the right groups, but due to the fact that I have no idea whether the first issue is actually resolved correctly, I can not say more things about this at the moment. 我认为我没有选择合适的小组,但是由于我不知道第一个问题是否得到正确解决,因此我目前无法对此发表更多看法。

any feedback on this would be appreciated. 任何反馈对此将不胜感激。

thank you in advance. 先感谢您。

EDIT: added main function to show how my radix function is called. 编辑:添加了main函数,以显示如何调用我的基数函数。

Another update, converted to template for array type. 另一个更新,转换为数组类型的模板。 Tmp array is now passed as a parameter. 现在将Tmp数组作为参数传递。 The copy steps were eliminated and a helper function added to return the buffer that the sorted data ends up in. Tested with 4 million 64 bit unsigned integers, it works but it's slow. 消除了复制步骤,并添加了一个辅助函数,以返回排序后的数据最终所在的缓冲区。使用400万个64位无符号整数进行了测试,它可以工作,但是速度很慢。 Fastest time achieved with numberOfBits = 4. numberOfBits no longer has to exactly divide the number of bits per element. 使用numberOfBits = 4可获得的最快时间。numberOfBits不再必须精确地划分每个元素的位数。

To explain why MSD first is slow I'll use a card sorter analogy. 为了解释为什么MSD首先很慢,我将使用卡分类器进行类比。 Imagine you have 1,000 cards, each with 3 digits, 000 to 999, in random order. 想象一下,您有1000张卡片,每张卡片以3位数字从000到999随机排列。 Normally you run through the sorter with the 3rd digit, ending up with 100 cards in each of the bins, bin 0 holds the cards with a "0", ... bin 9 holds the cards with a "9". 通常情况下,您使用第3位数字进行分类,最后在每个箱中有100张卡,箱0中的卡带有“ 0”,...箱9中的卡带有“ 9”。 You then concatenate the cards from bin 0 to bin 9, and run them through the sorter again using the 2nd digit, and again using the 1st digit, resulting in a sorted set of cards. 然后,您将卡从垃圾箱0连接到垃圾箱9,并再次使用第二个数字,再使用第一个数字,将它们通过分类器运行,从而产生一组已排序的卡片。 That's 3 runs with 1000 cards on each run, so a total of 3000 cards went through the sorter. 这是3次运行,每次运行1000张卡片,因此共有3000张卡片通过了分类器。

Now start with the randomly ordered cards again, and sort by the 1st digit. 现在再次从随机排序的卡片开始,并按第一个数字排序。 You can't concatenate the the sets, because cards with higher 1st digits but lower 2nd digits end up out of order. 您无法将集合并置,因为第一位数高而第二位数低的卡最终会乱序。 So now you have to do 10 runs with 100 cards each. 因此,现在您必须执行10次,每张100张卡片。 This results in 100 sets of 10 cards each, which you run again through the sorter, resulting in 1000 sets of 1 card each, and the cards are now sorted. 这将导致100套每张10张卡片,您将再次通过分拣器运行,从而得到1000套每张1张卡片,然后对卡片进行排序。 So the number of cards run through the sorter is still 3,000, same as above, but you had to do 111 runs (1 with 1000 card set, 10 with 100 card sets, 100 with 10 card sets). 因此,通过分类器的卡数仍为3,000,与上述相同,但您必须执行111次运行(1张卡套有1000张卡,10张卡套有100张卡,100张卡套有10张)。

template <typename T>
void RadixSortMSD(size_t start, size_t end, 
          size_t numberOfBits, size_t currentBit, T input[], T tmp[])
{
    if((end - start) < 1)
        return;

    // adjust numberOfBits if currentBit close to end element
    if((currentBit + numberOfBits) > (8*sizeof(T)))
        numberOfBits = (8*sizeof(T)) - currentBit;

    // set numberOfBuckets
    size_t numberOfBuckets = 1 << numberOfBits;
    size_t bitMask = numberOfBuckets - 1;
    size_t shift = (8*sizeof(T)) - currentBit - numberOfBits;

    // create bucket info
    size_t *buckets = new size_t[numberOfBuckets+1];
    for(size_t p = 0; p < numberOfBuckets+1; p++)
        buckets[p] = 0;
    for(size_t p = start; p < end; p++)
        buckets[(input[p] >> shift) & bitMask]++;
    size_t m = start;
    for(size_t p = 0; p < numberOfBuckets+1; p++){
        size_t n = buckets[p];
        buckets[p] = m;
        m += n;
    }

    //sort the input array input and store it in array tmp   
    for (size_t p = start; p < end; p++){ 
        tmp[buckets[(input[p] >> shift) & bitMask]++] = input[p];
    }

    // restore bucket info
    for(size_t p = numberOfBuckets; p > 0; p--)
        buckets[p] = buckets[p-1];
    buckets[0] = start;

    // advance current bit
    currentBit += numberOfBits;
    if(currentBit < (8*sizeof(T))){
        //recurse on all the groups that have been created
        for(size_t p=0; p < numberOfBuckets; p++){
            RadixSortMSD(buckets[p], buckets[p+1],
                numberOfBits, currentBit, tmp, input);
        }
    }

    //free buckets
    delete[] buckets;
    return;
}

template <typename T>
T * RadixSort(T *pData, T *pTmp, size_t n)
{
size_t numberOfBits = 4;
    RadixSortMSD(0, n, numberOfBits, 0, pData, pTmp);
    // return the pointer to the sorted data
    if((((8*sizeof(T))+numberOfBits-1)/numberOfBits)&1)
        return pTmp;
    else
        return pData;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM