简体   繁体   English

C++ 中的 MSD 基数排序(字典顺序)

[英]MSD Radix-sort (lexicographic order) in C++

Thanks for reading this post.感谢您阅读这篇文章。

I wanted to create an MSD radix sort that's supposed to sort a vector of unsigned integers in lexicographic (alphabetic) order.我想创建一个 MSD 基数排序,它应该按字典(字母)顺序对无符号整数向量进行排序。

Given "1, 3, 32, 1254, 3, 165, 50000, 11, 213"给定 "1, 3, 32, 1254, 3, 165, 50000, 11, 213"

Sorted "1, 11, 1254, 165, 213, 3, 3, 32, 50000"排序为“1, 11, 1254, 165, 213, 3, 3, 32, 50000”

Since I think I should do it recursively, I tried to capture the highest digits and call the function recursively with the next digit for all the numbers.由于我认为我应该递归地执行此操作,因此我尝试捕获最高数字并使用所有数字的下一个数字递归调用该函数。 However, I just realized that I got the logic wrong since this would sort in the regular numeric order as I iterated on all numbers from the highest digit with the same digit (eg the 5th one, which could be 0 for numbers that have no more than 5 digits).但是,我刚刚意识到我的逻辑错误,因为这将按常规数字顺序排序,因为我从具有相同数字的最高数字开始迭代所有数字(例如,第 5 个数字,对于没有更多数字的数字可能为 0超过 5 位数字)。 So I abandoned this algorithm but could not come up with new thoughts.所以我放弃了这个算法,但无法提出新的想法。

Since this could deal with any numbers, it should operate recursively.由于这可以处理任何数字,因此它应该递归操作。 I have some ideas now, but they seemed not to be working:我现在有一些想法,但它们似乎不起作用:

  1. Since this is similar to an alphabetic order, I could change the integers into strings by using std::to_string() , and use std::sort() , but I don't think this is a good option since it's no longer seeking an algorithm outcome, and I don't know how to change the string back to an unsigned integer.由于这类似于字母顺序,我可以使用std::to_string()将整数更改为字符串,并使用std::sort() ,但我认为这不是一个好的选择,因为它不再寻求算法结果,我不知道如何将字符串改回无符号整数。
  2. I wanted to find the largest digit by repeatedly dividing 10 until the result is less than 10, then sort by this digit for each number, but it's not working since the digits of the number vary, and I cannot do it recursively as I already lost most part of the data by dividing.我想通过反复除以 10 直到结果小于 10 来找到最大的数字,然后按每个数字的这个数字排序,但它不起作用,因为数字的数字不同,我不能递归地做,因为我已经输了大部分数据通过划分。 I think I am still sticking to the numeric sorting model.我想我仍然坚持数字排序模型。 I don't really see the steps that we could make recursion possible when we cannot determine a fixed digit or any possible point to compare so we could implement the recursive sort.当我们无法确定固定数字或任何可能的比较点时,我真的没有看到我们可以使递归成为可能的步骤,以便我们可以实现递归排序。

Do you have any implementation ideas about this different kind of number sorting?您对这种不同类型的数字排序有什么实现想法吗?

#include <vector>
#include <algorithm>
#include <iostream>
#include <array>
#include <numeric>
#include <cassert>

namespace details
{
    const int kNoDigit = -1;
    int ExtractDigit(int i, int pos) {
        const int digitsCount = log10(i) + 1;
        if (pos > digitsCount) return kNoDigit;
        return (int)(i / pow(10, digitsCount - pos)) % 10;
    }

    // For pos equals to 2 and {10 20 1} -> {0, 1, 3, 3, ...},
    // 1 ends with empty digit in the second digit and 2 ends with 0
    template <class It>
    auto CountingSort(It begin, It end, int pos) {
        std::array<int, 12> bins;
        std::fill(bins.begin(), bins.end(), 0);
        for (auto it = begin; it < end; ++it) {
            const int digit = ExtractDigit(*it, pos);
            ++bins[digit + 1];
        }
        std::partial_sum(bins.cbegin(), bins.cend(), bins.begin());
        std::move(bins.cbegin(), bins.cend() - 1, bins.begin() + 1);
        return bins;
    }

    template <class It>
    void MsdRadixInternal(It begin, It end, int pos) {
        const auto bins = CountingSort(begin, end, pos);
        // We finish when i is 1, because the last part ends up sorted anyway
        for (int i = 10; i > 0; --i) {
            const int digit = i - 1;
            const auto local_begin = begin + bins[i];
            const auto local_end = begin + bins[i + 1];
            if (local_begin == begin) break;
            if (std::distance(local_begin, local_end) > 0) {
                auto crsrForward = begin;
                auto crsrBackward = local_end - 1;
                while (crsrForward < crsrBackward) {
                    assert(crsrForward < local_begin && local_begin <= crsrBackward);
                    while (ExtractDigit(*crsrBackward, pos) == digit) --crsrBackward;
                    while (ExtractDigit(*crsrForward, pos) != digit) ++crsrForward;
                    if (crsrForward < local_begin) {
                        std::swap(*crsrBackward, *crsrForward);
                    }
                    ++crsrForward;
                }
            }
        }
        // Start from 1 as we don't want to sort numbers wich are out of digits in pos already
        for (int i = 1; i < 11; ++i) {
            if (bins[i + 1] - bins[i] > 1)
                MsdRadixInternal(begin + bins[i], begin + bins[i + 1], pos + 1);
        }
    }
}

template <class It>
void MsdRadix(It begin, It end) {
    details::MsdRadixInternal(begin, end, 1);
}

int main()
{
    std::vector<int> v = { 1, 3, 32, 1254, 3, 165, 50000, 11, 213 };
    MsdRadix(v.begin(), v.end());
    std::copy(v.begin(), v.end(), std::ostream_iterator<int>(std::cout, " "));
    return 0;
}

1 11 1254 165 213 3 3 32 50000 1 11 1254 165 213 3 3 32 50000

This implementation doesn't aim for efficiency, eg extract digit could be implemented in a much faster way.这种实现并不以效率为目标,例如可以以更快的方式实现提取数字。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM