简体   繁体   English

基本的基数排序

[英]Very basic radix sort

I just wrote a simple iterative radix sort and I'm wondering if I have the right idea. 我只是写了一个简单的迭代基数排序,我想知道我是否有正确的主意。
Recursive implementations seem to be much more common. 递归实现似乎更为常见。

I am sorting 4-byte integers (unsigned to keep it simple). 我正在对4个字节的整数进行排序(为简单起见,使用了无符号整数)。
I am using 1-byte as the 'digit'. 我正在使用1字节作为“数字”。 So I have 2^8=256 buckets. 所以我有2 ^ 8 = 256个水桶。
I am sorting the most significant digit (MSD) first. 我先对最高有效位(MSD)进行排序。
After each sort I put them back into array in the order they exist in buckets and then perform the next sort. 每次排序之后,我都按照它们在存储桶中的存在顺序将它们放回到数组中,然后执行下一个排序。
So I end up doing 4 bucket sorts. 所以我最终做了4个桶分类。
It seems to work for a small set of data. 它似乎适用于一小部分数据。 Since I am doing it MSD I'm guessing that's not stable and may fail with different data. 由于我正在执行MSD,因此我猜测这是不稳定的,并且可能因其他数据而失败。

Did I miss anything major? 我想念什么专业吗?

#include <iostream>
#include <vector>
#include <list>

using namespace std;

void radix(vector<unsigned>&);
void print(const vector<list<unsigned> >& listBuckets);
unsigned getMaxForBytes(unsigned bytes);
void merge(vector<unsigned>& data, vector<list<unsigned> >& listBuckets);

int main()
{
    unsigned d[] = {5,3,6,9,2,11,9, 65534, 4,10,17,13, 268435455, 4294967294,4294967293, 268435454,65537};
    vector<unsigned> v(d,d+17);

    radix(v);
    return 0;
}

void radix(vector<unsigned>& data)
{
    int bytes = 1;                                  //  How many bytes to compare at a time
    unsigned numOfBuckets = getMaxForBytes(bytes) + 1;
    cout << "Numbuckets" << numOfBuckets << endl;
    int chunks = sizeof(unsigned) / bytes;

    for(int i = chunks - 1; i >= 0; --i) 
    {
        vector<list<unsigned> > buckets;            // lazy, wasteful allocation
        buckets.resize(numOfBuckets);

        unsigned mask = getMaxForBytes(bytes);
        unsigned shift = i * bytes * 8;
        mask = mask << shift;

        for(unsigned j = 0; j < data.size(); ++j)
        {
            unsigned bucket = data[j] & mask;       //  isolate bits of current chunk
            bucket = bucket >> shift;               //  bring bits down to least significant

            buckets[bucket].push_back(data[j]); 
        }

        print(buckets);

        merge(data,buckets);
    }
}

unsigned getMaxForBytes(unsigned bytes)
{
    unsigned max = 0;
    for(unsigned i = 1; i <= bytes; ++i)
    {
        max = max << 8;
        max |= 0xFF;
    }

    return max;
}

void merge(vector<unsigned>& data, vector<list<unsigned> >& listBuckets)
{
    int index = 0;
    for(unsigned i = 0; i < listBuckets.size(); ++i)
    {
        list<unsigned>& list = listBuckets[i];
        std::list<unsigned>::const_iterator it = list.begin();

        for(; it != list.end(); ++it)
        {
            data[index] = *it;
            ++index;
        }
    }
}

void print(const vector<list<unsigned> >& listBuckets)
{
    cout << "Printing listBuckets: " << endl;
    for(unsigned i = 0; i < listBuckets.size(); ++i)
    {
        const list<unsigned>& list = listBuckets[i];

        if(list.size() == 0) continue;

        std::list<unsigned>::const_iterator it = list.begin();  //  Why do I need std here!?
        for(; it != list.end(); ++it)
        {
            cout << *it << ", ";
        }

        cout << endl;
    }
}



Update: 更新:
Seems to work well in LSD form which it can be modified by changing the the chunk loop in radix as follows: 似乎可以以LSD形式很好地工作,可以通过如下更改基数中的块循环来对其进行修改:

for(int i = chunks - 1; i >= 0; --i)

Let's look at en example with two-digit decimal numbers: 让我们来看一个带有两位十进制数字的示例:

49, 25, 19, 27, 87, 67, 22, 90, 47, 91

Sorting by the first digit yields 按第一位数排序产生

19, 25, 27, 22, 49, 47, 67, 87, 90, 91

Next, you sort by the second digit, yielding 接下来,您按第二个数字排序,得出

90, 91, 22, 25, 27, 47, 67, 87, 19, 49

Seems wrong, doesn't it? 看起来不对,不是吗? Or isn't this what you are doing? 还是这不是您在做什么? Maybe you can show us the code if I got you wrong. 如果我弄错了,也许您可​​以向我们显示代码。

If you are doing the second bucket sort on all groups with the same first digit(s), your algorithm would be equivalent to the recursive version. 如果您对所有具有相同第一位数字的组进行第二个存储桶排序,则您的算法将等效于递归版本。 It would be stable as well. 它将也是稳定的。 The only difference is that you'd do the bucket sorts breadth-first instead of depth-first. 唯一的不同是,您将存储桶的宽度优先于深度,而不是深度优先。

You also need to make sure you Sort every bucket from MSD to LSD before reassembling. 在重新组装之前,还需要确保对从MSD到LSD的每个存储桶进行排序。 Example: 19,76,90,34,84,12,72,38 Sort into 10 buckets [0-9] on MSD B0=[];B1=[19,12];B2=[];B3=[34,38];B4=[];B5=[];B6=[];B7=[76,72];B8=[84];B9=[90]; 示例:MSD上的19,76,90,34,84,12,72,38分成10个存储桶[0-9] B0 = []; B1 = [19,12]; B2 = []; B3 = [34 ,38]; B4 = []; B5 = []; B6 = []; B7 = [76,72]; B8 = [84]; B9 = [90]; if you were to reassemble and then sort again it would not work. 如果要重新组装然后再进行排序,将无法正常工作。 Instead recursively sort each bucket. 而是递归地对每个存储桶进行排序。 B1 is sorted into B1B2=[12];B1B9=[19] Once all have been sorted you can reassemble correctly. B1被分类为B1B2 = [12]; B1B9 = [19]一旦全部被分类,您就可以正确地重新组装。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM