简体   繁体   English

按字典顺序排序C ++的整数数组

[英]sort array of integers lexicographically C++

I want to sort a large array of integers (say 1 millon elements) lexicographically. 我想按字典顺序对大量整数(比如说1个元素)进行排序。

Example: 例:

input [] = { 100, 21 , 22 , 99 , 1  , 927 }
sorted[] = { 1  , 100, 21 , 22 , 927, 99  }

I have done it using the simplest possible method: 我用最简单的方法完成了它:

  • convert all numbers to strings (very costly because it will take huge memory) 将所有数字转换为字符串(非常昂贵,因为它将占用大量内存)
  • use std:sort with strcmp as comparison function 使用std:sort使用strcmp作为比较函数进行std:sort
  • convert back the strings to integers 将字符串转换回整数

Is there a better method than this? 有没有比这更好的方法?

Use std::sort() with a suitable comparison function. 使用带有合适比较函数的std::sort() This cuts down on the memory requirements. 这减少了内存需求。

The comparison function can use n % 10 , n / 10 % 10 , n / 100 % 10 etc. to access the individual digits (for positive integers; negative integers work a bit differently). 比较函数可以使用n % 10n / 10 % 10n / 100 % 10等来访问各个数字(对于正整数;负整数的工作方式略有不同)。

To provide any custom sort ordering, you can provide a comparator to std::sort . 要提供任何自定义排序顺序,您可以为std::sort提供比较器。 In this case, it's going to be somewhat complex, using logarithms to inspect individual digits of your number in base 10. 在这种情况下,它会有些复杂,使用对数来检查基数为10的个数。

Here's an example — comments inline describe what's going on. 这是一个例子 - 评论内联描述了正在发生的事情。

#include <iostream>
#include <algorithm>
#include <cmath>
#include <cassert>

int main() {
    int input[] { 100, 21, 22, 99, 1, 927, -50, -24, -160 };

    /**
     * Sorts the array lexicographically.
     * 
     * The trick is that we have to compare digits left-to-right
     * (considering typical Latin decimal notation) and that each of
     * two numbers to compare may have a different number of digits.
     * 
     * This is very efficient in storage space, but inefficient in
     * execution time; an approach that pre-visits each element and
     * stores a translated representation will at least double your
     * storage requirements (possibly a problem with large inputs)
     * but require only a single translation of each element.
     */
    std::sort(
        std::begin(input),
        std::end(input),
        [](int lhs, int rhs) -> bool {
            // Returns true if lhs < rhs
            // Returns false otherwise
            const auto BASE      = 10;
            const bool LHS_FIRST = true;
            const bool RHS_FIRST = false;
            const bool EQUAL     = false;


            // There's no point in doing anything at all
            // if both inputs are the same; strict-weak
            // ordering requires that we return `false`
            // in this case.
            if (lhs == rhs) {
                return EQUAL;
            }

            // Compensate for sign
            if (lhs < 0 && rhs < 0) {
                // When both are negative, sign on its own yields
                // no clear ordering between the two arguments.
                // 
                // Remove the sign and continue as for positive
                // numbers.
                lhs *= -1;
                rhs *= -1;
            }
            else if (lhs < 0) {
                // When the LHS is negative but the RHS is not,
                // consider the LHS "first" always as we wish to
                // prioritise the leading '-'.
                return LHS_FIRST;
            }
            else if (rhs < 0) {
                // When the RHS is negative but the LHS is not,
                // consider the RHS "first" always as we wish to
                // prioritise the leading '-'.
                return RHS_FIRST;
            }

            // Counting the number of digits in both the LHS and RHS
            // arguments is *almost* trivial.
            const auto lhs_digits = (
                lhs == 0
                ? 1
                : std::ceil(std::log(lhs+1)/std::log(BASE))
            );

            const auto rhs_digits = (
                rhs == 0
                ? 1
                : std::ceil(std::log(rhs+1)/std::log(BASE))
            );

            // Now we loop through the positions, left-to-right,
            // calculating the digit at these positions for each
            // input, and comparing them numerically. The
            // lexicographic nature of the sorting comes from the
            // fact that we are doing this per-digit comparison
            // rather than considering the input value as a whole.
            const auto max_pos = std::max(lhs_digits, rhs_digits);
            for (auto pos = 0; pos < max_pos; pos++) {
                if (lhs_digits - pos == 0) {
                    // Ran out of digits on the LHS;
                    // prioritise the shorter input
                    return LHS_FIRST;
                }
                else if (rhs_digits - pos == 0) {
                    // Ran out of digits on the RHS;
                    // prioritise the shorter input
                    return RHS_FIRST;
                }
                else {
                    const auto lhs_x = (lhs / static_cast<decltype(BASE)>(std::pow(BASE, lhs_digits - 1 - pos))) % BASE;
                    const auto rhs_x = (rhs / static_cast<decltype(BASE)>(std::pow(BASE, rhs_digits - 1 - pos))) % BASE;

                    if (lhs_x < rhs_x)
                        return LHS_FIRST;
                    else if (rhs_x < lhs_x)
                        return RHS_FIRST;
                }
            }

            // If we reached the end and everything still
            // matches up, then something probably went wrong
            // as I'd have expected to catch this in the tests
            // for equality.
            assert("Unknown case encountered");
        }
    );

    std::cout << '{';
    for (auto x : input)
        std::cout << x << ", ";
    std::cout << '}';

    // Output: {-160, -24, -50, 1, 100, 21, 22, 927, 99, }
}

Demo 演示

There are quicker ways to calculate the number of digits in a number , but the above will get you started. 有更快的方法来计算数字中的位数 ,但上面的内容将帮助您入门。

Here's another algorithm which does some of the computation before sorting. 这是另一种在排序之前完成一些计算的算法。 It seems to be quite fast, despite the additional copying (see comparisons ). 尽管有额外的复制(见比较 ),它似乎相当快。

Note: 注意:

  • it only supports positive integers 它只支持正整数
  • in only supports integers <= std::numeric_limits<int>::max()/10 in仅支持整数<= std::numeric_limits<int>::max()/10

NB you can optimize count_digits and my_pow10 ; 注意,你可以优化count_digitsmy_pow10 ; for example, see Three Optimization Tips for C++ from Andrei Alexandrescu and Any way faster than pow() to compute an integer power of 10 in C++? 例如,请参阅Andrei Alexandrescu 撰写的 三个C ++优化技巧,以及比pow()更快地计算C ++中10的整数幂?

Helpers: 助手:

#include <random>
#include <vector>
#include <utility>
#include <cmath>
#include <iostream>
#include <algorithm>
#include <limits>
#include <iterator>

// non-optimized version
int count_digits(int p) // returns `0` for `p == 0`
{
    int res = 0;
    for(; p != 0; ++res)
    {
        p /= 10;
    }
    return res;
}

// non-optimized version
int my_pow10(unsigned exp)
{
    int res = 1;
    for(; exp != 0; --exp)
    {
        res *= 10;
    }
    return res;
}

Algorithm (note - not in-place): 算法(注意 - 不是就地):

// helper to provide integers with the same number of digits
template<class T, class U>
std::pair<T, T> lexicographic_pair_helper(T const p, U const maxDigits)
{
    auto const digits = count_digits(p);
    // append zeros so that `l` has `maxDigits` digits
    auto const l = static_cast<T>( p  * my_pow10(maxDigits-digits) );
    return {l, p};
}

template<class RaIt>
using pair_vec
    = std::vector<std::pair<typename std::iterator_traits<RaIt>::value_type,
                            typename std::iterator_traits<RaIt>::value_type>>;

template<class RaIt>
pair_vec<RaIt> lexicographic_sort(RaIt p_beg, RaIt p_end)
{
    if(p_beg == p_end) return {};

    auto max = *std::max_element(p_beg, p_end);
    auto maxDigits = count_digits(max);

    pair_vec<RaIt> result;
    result.reserve( std::distance(p_beg, p_end) );

    for(auto i = p_beg; i != p_end; ++i)
        result.push_back( lexicographic_pair_helper(*i, maxDigits) );

    using value_type = typename pair_vec<RaIt>::value_type;

    std::sort(begin(result), end(result),
              [](value_type const& l, value_type const& r)
              {
                  if(l.first < r.first) return true;
                  if(l.first > r.first) return false;
                  return l.second < r.second; }
             );

    return result;
}

Usage example: 用法示例:

int main()
{
    std::vector<int> input = { 100, 21 , 22 , 99 , 1  , 927 };
    // generate some numbers
    /*{
        constexpr int number_of_elements = 1E6;
        std::random_device rd;
        std::mt19937 gen( rd() );
        std::uniform_int_distribution<>
            dist(0, std::numeric_limits<int>::max()/10);
        for(int i = 0; i < number_of_elements; ++i)
            input.push_back( dist(gen) );
    }*/

    std::cout << "unsorted: ";
    for(auto const& e : input) std::cout << e << ", ";
    std::cout << "\n\n";


    auto sorted = lexicographic_sort(begin(input), end(input));

    std::cout << "sorted: ";
    for(auto const& e : sorted) std::cout << e.second << ", ";
    std::cout << "\n\n";
}

Here's a community wiki to compare the solutions. 这是一个社区维基,用于比较解决方案。 I took nim's code and made it easily extensible. 我使用了nim的代码并使其易于扩展。 Feel free to add your solutions and outputs. 随意添加您的解决方案和输出。

Sample runs an old slow computer (3 GB RAM, Core2Duo U9400) with g++4.9 @ -O3 -march=native : 示例运行旧的慢速计算机(3 GB RAM,Core2Duo U9400)与g ++ 4.9 @ -O3 -march=native

number of elements: 1e+03
size of integer type: 4

reference solution: Lightness Races in Orbit

solution "dyp":
    duration: 0 ms and 301 microseconds
    comparison to reference solution: exact match
solution "Nim":
    duration: 2 ms and 160 microseconds
    comparison to reference solution: exact match
solution "nyarlathotep":
    duration: 8 ms and 126 microseconds
    comparison to reference solution: exact match
solution "notbad":
    duration: 1 ms and 102 microseconds
    comparison to reference solution: exact match
solution "Eric Postpischil":
    duration: 2 ms and 550 microseconds
    comparison to reference solution: exact match
solution "Lightness Races in Orbit":
    duration: 17 ms and 469 microseconds
    comparison to reference solution: exact match
solution "pts":
    duration: 1 ms and 92 microseconds
    comparison to reference solution: exact match

==========================================================

number of elements: 1e+04
size of integer type: 4

reference solution: Lightness Races in Orbit

solution "nyarlathotep":
    duration: 109 ms and 712 microseconds
    comparison to reference solution: exact match
solution "Lightness Races in Orbit":
    duration: 272 ms and 819 microseconds
    comparison to reference solution: exact match
solution "dyp":
    duration: 1 ms and 748 microseconds
    comparison to reference solution: exact match
solution "notbad":
    duration: 16 ms and 115 microseconds
    comparison to reference solution: exact match
solution "pts":
    duration: 15 ms and 10 microseconds
    comparison to reference solution: exact match
solution "Eric Postpischil":
    duration: 33 ms and 301 microseconds
    comparison to reference solution: exact match
solution "Nim":
    duration: 17 ms and 83 microseconds
    comparison to reference solution: exact match

==========================================================

number of elements: 1e+05
size of integer type: 4

reference solution: Lightness Races in Orbit

solution "Nim":
    duration: 217 ms and 4 microseconds
    comparison to reference solution: exact match
solution "pts":
    duration: 199 ms and 505 microseconds
    comparison to reference solution: exact match
solution "dyp":
    duration: 20 ms and 330 microseconds
    comparison to reference solution: exact match
solution "Eric Postpischil":
    duration: 415 ms and 477 microseconds
    comparison to reference solution: exact match
solution "Lightness Races in Orbit":
    duration: 3955 ms and 58 microseconds
    comparison to reference solution: exact match
solution "notbad":
    duration: 215 ms and 259 microseconds
    comparison to reference solution: exact match
solution "nyarlathotep":
    duration: 1341 ms and 46 microseconds
    comparison to reference solution: mismatch found

==========================================================

number of elements: 1e+06
size of integer type: 4

reference solution: Lightness Races in Orbit

solution "Lightness Races in Orbit":
    duration: 52861 ms and 314 microseconds
    comparison to reference solution: exact match
solution "Eric Postpischil":
    duration: 4757 ms and 608 microseconds
    comparison to reference solution: exact match
solution "nyarlathotep":
    duration: 15654 ms and 195 microseconds
    comparison to reference solution: mismatch found
solution "dyp":
    duration: 233 ms and 779 microseconds
    comparison to reference solution: exact match
solution "pts":
    duration: 2181 ms and 634 microseconds
    comparison to reference solution: exact match
solution "Nim":
    duration: 2539 ms and 9 microseconds
    comparison to reference solution: exact match
solution "notbad":
    duration: 2675 ms and 362 microseconds
    comparison to reference solution: exact match

==========================================================

number of elements: 1e+07
size of integer type: 4

reference solution: Lightness Races in Orbit

solution "notbad":
    duration: 33425 ms and 423 microseconds
    comparison to reference solution: exact match
solution "pts":
    duration: 26000 ms and 398 microseconds
    comparison to reference solution: exact match
solution "Eric Postpischil":
    duration: 56206 ms and 359 microseconds
    comparison to reference solution: exact match
solution "Lightness Races in Orbit":
    duration: 658540 ms and 342 microseconds
    comparison to reference solution: exact match
solution "nyarlathotep":
    duration: 187064 ms and 518 microseconds
    comparison to reference solution: mismatch found
solution "Nim":
    duration: 30519 ms and 227 microseconds
    comparison to reference solution: exact match
solution "dyp":
    duration: 2624 ms and 644 microseconds
    comparison to reference solution: exact match

The algorithms have to be structs with function-call operator templates that support the interface: 算法必须是具有支持接口的函数调用操作符模板的结构:

template<class RaIt> operator()(RaIt begin, RaIt end);

A copy of the input data is provided as a parameter, the algorithm is expected to provide the result in the same range (eg in-place sort). 提供输入数据的副本作为参数,期望算法在相同范围内提供结果(例如,就地排序)。

#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <random>
#include <vector>
#include <utility>
#include <cmath>
#include <cassert>
#include <chrono>
#include <cstring>
#include <climits>
#include <functional>
#include <cstdlib>
#include <iomanip>

using duration_t = decltype(  std::chrono::high_resolution_clock::now()
                            - std::chrono::high_resolution_clock::now());

template<class T>
struct result_t
{
    std::vector<T> numbers;
    duration_t duration;
    char const* name;
};

template<class RaIt, class F>
result_t<typename std::iterator_traits<RaIt>::value_type>
apply_algorithm(RaIt p_beg, RaIt p_end, F f, char const* name)
{
    using value_type = typename std::iterator_traits<RaIt>::value_type;

    std::vector<value_type> inplace(p_beg, p_end);

    auto start = std::chrono::high_resolution_clock::now();

    f(begin(inplace), end(inplace));

    auto end = std::chrono::high_resolution_clock::now();
    auto duration = end - start;

    return {std::move(inplace), duration, name};
}

// non-optimized version
int count_digits(int p) // returns `0` for `p == 0`
{
        int res = 0;
        for(; p != 0; ++res)
        {
            p /= 10;
        }
        return res;
}

// non-optimized version
int my_pow10(unsigned exp)
{
        int res = 1;
        for(; exp != 0; --exp)
        {
            res *= 10;
        }
        return res;
}


// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
// paste algorithms here
// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


int main(int argc, char** argv)
{
    using integer_t = int;
    constexpr integer_t dist_min = 0;
    constexpr integer_t dist_max = std::numeric_limits<integer_t>::max()/10;
    constexpr std::size_t default_number_of_elements = 1E6;

    const std::size_t number_of_elements = argc>1 ? std::atoll(argv[1]) :
                                           default_number_of_elements;
    std::cout << "number of elements: ";
    std::cout << std::scientific << std::setprecision(0);
    std::cout << (double)number_of_elements << "\n";
    std::cout << /*std::defaultfloat <<*/ std::setprecision(6);
    std::cout.unsetf(std::ios_base::floatfield);

    std::cout << "size of integer type: " << sizeof(integer_t) << "\n\n";

    std::vector<integer_t> input;
    {
        input.reserve(number_of_elements);

        std::random_device rd;
        std::mt19937 gen( rd() );
        std::uniform_int_distribution<> dist(dist_min, dist_max);

        for(std::size_t i = 0; i < number_of_elements; ++i)
            input.push_back( dist(gen) );
    }

    auto b = begin(input);
    auto e = end(input);

    using res_t = result_t<integer_t>;
    std::vector< std::function<res_t()> > algorithms;

    #define MAKE_BINDER(B, E, ALGO, NAME) \
        std::bind( &apply_algorithm<decltype(B),decltype(ALGO)>, \
                   B,E,ALGO,NAME )
    constexpr auto lightness_name = "Lightness Races in Orbit";
    algorithms.push_back( MAKE_BINDER(b, e, lightness(), lightness_name) );
    algorithms.push_back( MAKE_BINDER(b, e, dyp(), "dyp") );
    algorithms.push_back( MAKE_BINDER(b, e, nim(), "Nim") );
    algorithms.push_back( MAKE_BINDER(b, e, pts(), "pts") );
    algorithms.push_back( MAKE_BINDER(b, e, epost(), "Eric Postpischil") );
    algorithms.push_back( MAKE_BINDER(b, e, nyar(), "nyarlathotep") );
    algorithms.push_back( MAKE_BINDER(b, e, notbad(), "notbad") );

    {
        std::srand( std::random_device()() );
        std::random_shuffle(begin(algorithms), end(algorithms));
    }

    std::vector< result_t<integer_t> > res;
    for(auto& algo : algorithms)
        res.push_back( algo() );

    auto reference_solution
        = *std::find_if(begin(res), end(res),
                        [](result_t<integer_t> const& p)
                        { return 0 == std::strcmp(lightness_name, p.name); });
    std::cout << "reference solution: "<<reference_solution.name<<"\n\n";

    for(auto const& e : res)
    {
        std::cout << "solution \""<<e.name<<"\":\n";
        auto ms =
            std::chrono::duration_cast<std::chrono::microseconds>(e.duration);
        std::cout << "\tduration: "<<ms.count()/1000<<" ms and "
                                   <<ms.count()%1000<<" microseconds\n";

        std::cout << "\tcomparison to reference solution: ";
        if(e.numbers.size() != reference_solution.numbers.size())
        {
            std::cout << "ouput count mismatch\n";
            break;
        }

        auto mismatch = std::mismatch(begin(e.numbers), end(e.numbers),
                                      begin(reference_solution.numbers)).first;
        if(end(e.numbers) == mismatch)
        {
            std::cout << "exact match\n";
        }else
        {
            std::cout << "mismatch found\n";
        }
    }
}

Current algorithms; 目前的算法; note I replaced the digit counters and pow-of-10 with the global function, so we all benefit if someone optimizes. 注意我用全局函数替换了数字计数器和pow-of-10,所以如果有人优化我们都会受益。

struct lightness
{
    template<class RaIt> void operator()(RaIt b, RaIt e)
    {
        using T = typename std::iterator_traits<RaIt>::value_type;

        /**
         * Sorts the array lexicographically.
         *
         * The trick is that we have to compare digits left-to-right
         * (considering typical Latin decimal notation) and that each of
         * two numbers to compare may have a different number of digits.
         *
         * This is very efficient in storage space, but inefficient in
         * execution time; an approach that pre-visits each element and
         * stores a translated representation will at least double your
         * storage requirements (possibly a problem with large inputs)
         * but require only a single translation of each element.
         */
        std::sort(
            b,
            e,
            [](T lhs, T rhs) -> bool {
                // Returns true if lhs < rhs
                // Returns false otherwise
                const auto BASE      = 10;
                const bool LHS_FIRST = true;
                const bool RHS_FIRST = false;
                const bool EQUAL     = false;


                // There's no point in doing anything at all
                // if both inputs are the same; strict-weak
                // ordering requires that we return `false`
                // in this case.
                if (lhs == rhs) {
                    return EQUAL;
                }

                // Compensate for sign
                if (lhs < 0 && rhs < 0) {
                    // When both are negative, sign on its own yields
                    // no clear ordering between the two arguments.
                    //
                    // Remove the sign and continue as for positive
                    // numbers.
                    lhs *= -1;
                    rhs *= -1;
                }
                else if (lhs < 0) {
                    // When the LHS is negative but the RHS is not,
                    // consider the LHS "first" always as we wish to
                    // prioritise the leading '-'.
                    return LHS_FIRST;
                }
                else if (rhs < 0) {
                    // When the RHS is negative but the LHS is not,
                    // consider the RHS "first" always as we wish to
                    // prioritise the leading '-'.
                    return RHS_FIRST;
                }

                // Counting the number of digits in both the LHS and RHS
                // arguments is *almost* trivial.
                const auto lhs_digits = (
                    lhs == 0
                    ? 1
                    : std::ceil(std::log(lhs+1)/std::log(BASE))
                );

                const auto rhs_digits = (
                    rhs == 0
                    ? 1
                    : std::ceil(std::log(rhs+1)/std::log(BASE))
                );

                // Now we loop through the positions, left-to-right,
                // calculating the digit at these positions for each
                // input, and comparing them numerically. The
                // lexicographic nature of the sorting comes from the
                // fact that we are doing this per-digit comparison
                // rather than considering the input value as a whole.
                const auto max_pos = std::max(lhs_digits, rhs_digits);
                for (auto pos = 0; pos < max_pos; pos++) {
                    if (lhs_digits - pos == 0) {
                        // Ran out of digits on the LHS;
                        // prioritise the shorter input
                        return LHS_FIRST;
                    }
                    else if (rhs_digits - pos == 0) {
                        // Ran out of digits on the RHS;
                        // prioritise the shorter input
                        return RHS_FIRST;
                    }
                    else {
                        const auto lhs_x = (lhs / static_cast<decltype(BASE)>(std::pow(BASE, lhs_digits - 1 - pos))) % BASE;
                        const auto rhs_x = (rhs / static_cast<decltype(BASE)>(std::pow(BASE, rhs_digits - 1 - pos))) % BASE;

                        if (lhs_x < rhs_x)
                            return LHS_FIRST;
                        else if (rhs_x < lhs_x)
                            return RHS_FIRST;
                    }
                }

                // If we reached the end and everything still
                // matches up, then something probably went wrong
                // as I'd have expected to catch this in the tests
                // for equality.
                assert("Unknown case encountered");

                // dyp: suppress warning and throw
                throw "up";
            }
        );
    }
};

namespace ndyp
{
    // helper to provide integers with the same number of digits
    template<class T, class U>
    std::pair<T, T> lexicographic_pair_helper(T const p, U const maxDigits)
    {
        auto const digits = count_digits(p);
        // append zeros so that `l` has `maxDigits` digits
        auto const l = static_cast<T>( p  * my_pow10(maxDigits-digits) );
        return {l, p};
    }

    template<class RaIt>
    using pair_vec
        = std::vector<std::pair<typename std::iterator_traits<RaIt>::value_type,
                                typename std::iterator_traits<RaIt>::value_type>>;

    template<class RaIt>
    pair_vec<RaIt> lexicographic_sort(RaIt p_beg, RaIt p_end)
    {
        if(p_beg == p_end) return pair_vec<RaIt>{};

        auto max = *std::max_element(p_beg, p_end);
        auto maxDigits = count_digits(max);

        pair_vec<RaIt> result;
        result.reserve( std::distance(p_beg, p_end) );

        for(auto i = p_beg; i != p_end; ++i)
            result.push_back( lexicographic_pair_helper(*i, maxDigits) );

        using value_type = typename pair_vec<RaIt>::value_type;

        std::sort(begin(result), end(result),
                  [](value_type const& l, value_type const& r)
                  {
                      if(l.first < r.first) return true;
                      if(l.first > r.first) return false;
                      return l.second < r.second; }
                 );

        return result;
    }
}

struct dyp
{
    template<class RaIt> void operator()(RaIt b, RaIt e)
    {
        auto pairvec = ndyp::lexicographic_sort(b, e);
        std::transform(begin(pairvec), end(pairvec), b,
                       [](typename decltype(pairvec)::value_type const& e) { return e.second; });
    }
};


namespace nnim
{
    bool comp(int l, int r)
    {
      int lv[10] = {}; // probably possible to get this from numeric_limits
      int rv[10] = {};

      int lc = 10; // ditto
      int rc = 10;
      while (l || r)
      {
        if (l)
        {
          auto t = l / 10;
          lv[--lc] = l - (t * 10);
          l = t;
        }
        if (r)
        {
          auto t = r / 10;
          rv[--rc] = r - (t * 10);
          r = t;
        }
      }
      while (lc < 10 && rc < 10)
      {
        if (lv[lc] == rv[rc])
        {
          lc++;
          rc++;
        }
        else
          return lv[lc] < rv[rc];
      }
      return lc > rc;
    }
}

struct nim
{
    template<class RaIt> void operator()(RaIt b, RaIt e)
    {
        std::sort(b, e, nnim::comp);
    }
};

struct pts
{
        template<class T> static bool lex_less(T a, T b) {
          unsigned la = 1, lb = 1;
          for (T t = a; t > 9; t /= 10) ++la;
          for (T t = b; t > 9; t /= 10) ++lb;
          const bool ll = la < lb;
          while (la > lb) { b *= 10; ++lb; }
          while (lb > la) { a *= 10; ++la; }
          return a == b ? ll : a < b;
        }

        template<class RaIt> void operator()(RaIt b, RaIt e)
    {
        std::sort(b, e, lex_less<typename std::iterator_traits<RaIt>::value_type>);
    }
};

struct epost
{
        static bool compare(int x, int y)
        {
                static const double limit = .5 * (log(INT_MAX) - log(INT_MAX-1));

                double lx = log10(x);
                double ly = log10(y);
                double fx = lx - floor(lx);  // Get the mantissa of lx.
                double fy = ly - floor(ly);  // Get the mantissa of ly.
                return fabs(fx - fy) < limit ? lx < ly : fx < fy;
        }

        template<class RaIt> void operator()(RaIt b, RaIt e)
    {
        std::sort(b, e, compare);
    }
};

struct nyar
{
        static bool lexiSmaller(int i1, int i2)
        {
            int digits1 = count_digits(i1);
            int digits2 = count_digits(i2);

            double val1 = i1/pow(10.0, digits1-1);
            double val2 = i2/pow(10.0, digits2-1);

            while (digits1 > 0 && digits2 > 0 && (int)val1 == (int)val2)
            {
                digits1--;
                digits2--;
                val1 = (val1 - (int)val1)*10;
                val2 = (val2 - (int)val2)*10;
            }
            if (digits1 > 0 && digits2 > 0)
            {
                return (int)val1 < (int)val2;
            }
            return (digits2 > 0);
        }

        template<class RaIt> void operator()(RaIt b, RaIt e)
    {
        std::sort(b, e, lexiSmaller);
    }
};

struct notbad
{
        static int up_10pow(int n) {
          int ans = 1;
          while (ans < n) ans *= 10;
          return ans;
        }

        static bool compare(int v1, int v2) {
           int ceil1 = up_10pow(v1), ceil2 = up_10pow(v2);
           while ( ceil1 != 0 && ceil2 != 0) {
              if (v1 / ceil1  < v2 / ceil2) return true;
              else if (v1 / ceil1 > v2 / ceil2) return false;
              ceil1 /= 10;
              ceil2 /= 10;
           }
           if (v1 < v2) return true;
           return false;
        }

        template<class RaIt> void operator()(RaIt b, RaIt e)
    {
        std::sort(b, e, compare);
    }
};

I believe the following works as a sort comparison function for positive integers provided the integer type used is substantially narrower than the double type (eg, 32-bit int and 64-bit double ) and the log10 routine used returns exactly correct results for exact powers of 10 (which a good implementation does): 我相信以下作为正整数的排序比较函数,只要使用的整数类型比double类型(例如,32位int和64位double )窄,并且使用的log10例程返回完全正确的结果以获得精确的幂10(这是一个很好的实现):

static const double limit = .5 * (log(INT_MAX) - log(INT_MAX-1));

double lx = log10(x);
double ly = log10(y);
double fx = lx - floor(lx);  // Get the mantissa of lx.
double fy = ly - floor(ly);  // Get the mantissa of ly.
return fabs(fx - fy) < limit ? lx < ly : fx < fy;

It works by comparing the mantissas of the logarithms. 它通过比较对数的尾数来工作。 The mantissas are the fractional parts of the logarithm, and they indicate the value of the significant digits of a number without the magnitude (eg, the logarithms of 31, 3.1, and 310 have exactly the same mantissa). 尾数是对数的小数部分,它们表示没有大小的数字的有效数字的值(例如,31,3.1和310的对数具有完全相同的尾数)。

The purpose of fabs(fx - fy) < limit is to allow for errors in taking the logarithm, which occur both because implementations of log10 are imperfect and because the floating-point format forces some error. fabs(fx - fy) < limit是允许采用对数时出错,因为log10实现不完美,并且因为浮点格式会导致一些错误。 (The integer portions of the logarithms of 31 and 310 use different numbers of bits, so there are different numbers of bits left for the significand, so they end up being rounded to slightly different values.) As long as the integer type is substantially narrower than the double type, the calculated limit will be much larger than the error in log10 . (31和310的对数的整数部分使用不同的位数,因此有效数字的位数不同,因此它们最终被四舍五入到略微不同的值。)只要整数类型基本上更窄比double类型,计算的limit将远大于log10的错误。 Thus, the test fabs(fx - fy) < limit essentially tells us whether two calculated mantissas would be equal if calculated exactly. 因此,测试fabs(fx - fy) < limit基本上告诉我们如果精确计算,两个计算的尾数是否相等。

If the mantissas differ, they indicate the lexicographic order, so we return fx < fy . 如果尾数不同,它们表示字典顺序,所以我们返回fx < fy If they are equal, then the integer portion of the logarithm tells us the order, so we return lx < ly . 如果它们相等,那么对数的整数部分告诉我们顺序,所以我们返回lx < ly

It is simple to test whether log10 returns correct results for every power of ten, since there are so few of them. 测试log10是否为每个10的幂返回正确的结果很简单,因为它们很少。 If it does not, adjustments can be made easily: Insert if (1-fx < limit) fx = 0; if (1-fu < limit) fy = 0; 如果没有,则可以轻松进行调整:插入if (1-fx < limit) fx = 0; if (1-fu < limit) fy = 0; if (1-fx < limit) fx = 0; if (1-fu < limit) fy = 0; . This allows for when log10 returns something like 4.99999… when it should have returned 5. 这允许当log10返回类似于4.99999的内容时...它应该返回5。

This method has the advantage of not using loops or division (which is time-consuming on many processors). 该方法具有不使用循环或除法的优点(这在许多处理器上是耗时的)。

The task sounds like a natural fit for an MSD variant of Radix Sort with padding ( http://en.wikipedia.org/wiki/Radix_sort ). 这个任务听起来像是一个自然适合Radix Sort with padding的MSD变体( http://en.wikipedia.org/wiki/Radix_sort )。

Depends on how much code you want to throw at it. 取决于你想要多少代码。 The simple code as the others show is O(log n) complexity, while a fully optimized radix sort would be O(kn). 其他显示的简单代码是O(log n)复杂度,而完全优化的基数排序则是O(kn)。

A compact solution if all your numbers are nonnegative and they are small enough so that multiplying them by 10 doesn't cause an overflow: 一个紧凑的解决方案,如果你的所有数字都是非负数且它们足够小以便将它们乘以10不会导致溢出:

template<class T> bool lex_less(T a, T b) {
  unsigned la = 1, lb = 1;
  for (T t = a; t > 9; t /= 10) ++la;
  for (T t = b; t > 9; t /= 10) ++lb;
  const bool ll = la < lb;
  while (la > lb) { b *= 10; ++lb; }
  while (lb > la) { a *= 10; ++la; }
  return a == b ? ll : a < b;
}

Run it like this: 像这样运行:

#include <iostream>
#include <algorithm>
int main(int, char **) {
  unsigned short input[] = { 100, 21 , 22 , 99 , 1  , 927 };
  unsigned input_size = sizeof(input) / sizeof(input[0]);
  std::sort(input, input + input_size, lex_less<unsigned short>);
  for (unsigned i = 0; i < input_size; ++i) {
    std::cout << ' ' << input[i];
  }
  std::cout << std::endl;
  return 0;
}

You could try using the % operator to give you access to each individual digit eg 121 % 100 will give you the first digit and check that way but you'll have to find a way to get around the fact they have different sizes. 您可以尝试使用%运算符来访问每个单独的数字,例如121%100将为您提供第一个数字并检查该方式,但您必须找到一种方法来解决它们具有不同大小的事实。

So find the maximum value in array. 所以在数组中找到最大值。 I don't know if theres a function for this in built you could try. 我不知道你是否可以尝试使用这个功能。

int Max (int* pdata,int size)
 {
int temp_max =0 ;

for (int i =0 ; i < size ; i++)
 {
    if (*(pdata+i) > temp_max)
    {
        temp_max = *(pdata+i);

    }
 }
 return temp_max;
 }

This function will return the number of digits in the number 此函数将返回数字中的位数

 int Digit_checker(int n)
{
 int num_digits = 1;

while (true)
{
    if ((n % 10) == n)
        return num_digits;
    num_digits++;
    n = n/10;
}
return num_digits;
}

Let number of digits in max equal n. 令max中的位数等于n。 Once you have this open a for loop in the format of for (int i = 1; i < n ; i++) 一旦你打开一个for循环的格式为for(int i = 1; i <n; i ++)

then you can go through your and use "data[i] % (10^(ni))" to get access to the first digit then sort that and then on the next iteration you'll get access to the second digit. 那么你可以浏览你的并使用“data [i]%(10 ^(ni))”来访问第一个数字,然后对其进行排序,然后在下一次迭代中你将获得第二个数字的访问权限。 I Don't know how you'll sort them though. 我不知道你怎么对它们进行排序。

It wont work for negative numbers and you'll have to get around data[i] % (10^(ni)) returning itself for numbers with less digits than max 它不会对负数工作,你必须绕过数据[i]%(10 ^(ni))返回数字少于最大数字的数字

Overload the < operator to compare two integers lexicographically. 重载<运算符以按字典顺序比较两个整数。 For each integer, find the smallest 10^k, which is not less than the given integer. 对于每个整数,找到最小的10 ^ k,它不小于给定的整数。 Than compare the digits one by one. 然后逐个比较数字。

class CmpIntLex {
int up_10pow(int n) {
  int ans = 1;
  while (ans < n) ans *= 10;
  return ans;
}
public: 
bool operator ()(int v1, int v2) {
   int ceil1 = up_10pow(v1), ceil2 = up_10pow(v2);
   while ( ceil1 != 0 && ceil2 != 0) {
      if (v1 / ceil1  < v2 / ceil2) return true;
      else if (v1 / ceil1 > v2 / ceil2) return false;
      ceil1 /= 10; 
      ceil2 /= 10;
   }
   if (v1 < v2) return true;
   return false;
}
int main() {
vector<int> vi = {12,45,12134,85};
sort(vi.begin(), vi.end(), CmpIntLex());
}

While some other answers here (Lightness's, notbad's) are already showing quite good code, I believe I can add one solution which might be more performant (since it requires neither division nor power in each loop; but it requires floating point arithmetic, which again might make it slow, and possibly inaccurate for large numbers): 虽然这里的一些其他答案(Lightness,notbad)已经显示出相当不错的代码,但我相信我可以添加一个可能更高效的解决方案(因为它在每个循环中既不需要除法也不需要幂;但它需要浮点运算,这又一次可能会使它变慢,并且对于大数字可能不准确):

#include <algorithm>
#include <iostream>
#include <assert.h>

// method taken from http://stackoverflow.com/a/1489873/671366
template <class T>
int numDigits(T number)
{
    int digits = 0;
    if (number < 0) digits = 1; // remove this line if '-' counts as a digit
    while (number) {
        number /= 10;
        digits++;
    }
    return digits;
}

bool lexiSmaller(int i1, int i2)
{
    int digits1 = numDigits(i1);
    int digits2 = numDigits(i2);

    double val1 = i1/pow(10.0, digits1-1);
    double val2 = i2/pow(10.0, digits2-1);

    while (digits1 > 0 && digits2 > 0 && (int)val1 == (int)val2)
    {
        digits1--;
        digits2--;
        val1 = (val1 - (int)val1)*10;
        val2 = (val2 - (int)val2)*10;
    }
    if (digits1 > 0 && digits2 > 0)
    {
        return (int)val1 < (int)val2;
    }
    return (digits2 > 0);
}


int main(int argc, char* argv[])
{
    // just testing whether the comparison function works as expected:
    assert (lexiSmaller(1, 100));
    assert (!lexiSmaller(100, 1));
    assert (lexiSmaller(100, 22));
    assert (!lexiSmaller(22, 100));
    assert (lexiSmaller(927, 99));
    assert (!lexiSmaller(99, 927));
    assert (lexiSmaller(1, 927));
    assert (!lexiSmaller(927, 1));
    assert (lexiSmaller(21, 22));
    assert (!lexiSmaller(22, 21));
    assert (lexiSmaller(22, 99));
    assert (!lexiSmaller(99, 22));

    // use the comparison function for the actual sorting:
    int input[] = { 100 , 21 , 22 , 99 , 1 ,927 };
    std::sort(&input[0], &input[5], lexiSmaller);
    std::cout << "sorted: ";
    for (int i=0; i<6; ++i)
    {
        std::cout << input[i];
        if (i<5)
        {
            std::cout << ", ";
        }
    }
    std::cout << std::endl;
    return 0;
}

Though I have to admit I haven't tested the performance yet. 虽然我不得不承认我还没有测试过性能。

Here is the dumb solution that doesn't use any floating point tricks. 这是一个不使用任何浮点技巧的愚蠢解决方案。 It's pretty much the same as the string comparison, but doesn't use a string per say, doesn't also handle negative numbers, to do that add a section at the top... 它与字符串比较几乎相同,但不是每个说都使用字符串,也不处理负数,为此在顶部添加一个部分...

bool comp(int l, int r)
{
  int lv[10] = {}; // probably possible to get this from numeric_limits
  int rv[10] = {};

  int lc = 10; // ditto
  int rc = 10;
  while (l || r)
  {
    if (l)
    {
      auto t = l / 10;
      lv[--lc] = l - (t * 10);
      l = t;
    }
    if (r)
    {
      auto t = r / 10;
      rv[--rc] = r - (t * 10);
      r = t;
    }
  }
  while (lc < 10 && rc < 10)
  {
    if (lv[lc] == rv[rc])
    {
      lc++;
      rc++;
    }
    else
      return lv[lc] < rv[rc];
  }
  return lc > rc;
}

It's fast, and I'm sure it's possible to make it faster still, but it works and it's dumb enough to understand... 这很快,而且我确信它可以让它更快,但它有效并且它足够愚蠢地理解......

EDIT: I ate to dump lots of code, but here is a comparison of all the solutions so far.. 编辑:我吃掉了很多代码,但到目前为止,这里是所有解决方案的比较。

#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <random>
#include <vector>
#include <utility>
#include <cmath>
#include <cassert>
#include <chrono>

std::pair<int, int> lexicographic_pair_helper(int p, int maxDigits)
{
  int digits = std::log10(p);
  int l = p*std::pow(10, maxDigits-digits);
  return {l, p};
}

bool l_comp(int l, int r)
{
  int lv[10] = {}; // probably possible to get this from numeric_limits
  int rv[10] = {};

  int lc = 10; // ditto
  int rc = 10;
  while (l || r)
  {
    if (l)
    {
      auto t = l / 10;
      lv[--lc] = l - (t * 10);
      l = t;
    }
    if (r)
    {
      auto t = r / 10;
      rv[--rc] = r - (t * 10);
      r = t;
    }
  }
  while (lc < 10 && rc < 10)
  {
    if (lv[lc] == rv[rc])
    {
      lc++;
      rc++;
    }
    else
      return lv[lc] < rv[rc];
  }
  return lc > rc;
}

int up_10pow(int n) {
  int ans = 1;
  while (ans < n) ans *= 10;
  return ans;
}
bool l_comp2(int v1, int v2) {
  int n1 = up_10pow(v1), n2 = up_10pow(v2);
  while ( v1 != 0 && v2 != 0) {
    if (v1 / n1  < v2 / n2) return true;
    else if (v1 / n1 > v2 / n2) return false;
    v1 /= 10;
    v2 /= 10;
    n1 /= 10;
    n2 /= 10;
  }
  if (v1 == 0 && v2 != 0) return true;
  return false;
}

int main()
{
  std::vector<int> numbers;
  {
    constexpr int number_of_elements = 1E6;
    std::random_device rd;
    std::mt19937 gen( rd() );
    std::uniform_int_distribution<> dist;
    for(int i = 0; i < number_of_elements; ++i) numbers.push_back( dist(gen) );
  }

  std::vector<int> lo(numbers);
  std::vector<int> dyp(numbers);
  std::vector<int> nim(numbers);
  std::vector<int> nb(numbers);

  std::cout << "starting..." << std::endl;

  {

    auto start = std::chrono::high_resolution_clock::now();
    /**
    * Sorts the array lexicographically.
    *
    * The trick is that we have to compare digits left-to-right
    * (considering typical Latin decimal notation) and that each of
    * two numbers to compare may have a different number of digits.
    *
    * This probably isn't very efficient, so I wouldn't do it on
    * "millions" of numbers. But, it works...
    */
    std::sort(
    std::begin(lo),
              std::end(lo),
              [](int lhs, int rhs) -> bool {
                // Returns true if lhs < rhs
                // Returns false otherwise
                const auto BASE      = 10;
                const bool LHS_FIRST = true;
                const bool RHS_FIRST = false;
                const bool EQUAL     = false;


                // There's no point in doing anything at all
                // if both inputs are the same; strict-weak
                // ordering requires that we return `false`
                // in this case.
                if (lhs == rhs) {
                  return EQUAL;
                }

                // Compensate for sign
                if (lhs < 0 && rhs < 0) {
                  // When both are negative, sign on its own yields
                  // no clear ordering between the two arguments.
                  //
                  // Remove the sign and continue as for positive
                  // numbers.
                  lhs *= -1;
                  rhs *= -1;
                }
                else if (lhs < 0) {
                  // When the LHS is negative but the RHS is not,
              // consider the LHS "first" always as we wish to
              // prioritise the leading '-'.
              return LHS_FIRST;
                }
                else if (rhs < 0) {
                  // When the RHS is negative but the LHS is not,
              // consider the RHS "first" always as we wish to
              // prioritise the leading '-'.
              return RHS_FIRST;
                }

                // Counting the number of digits in both the LHS and RHS
                // arguments is *almost* trivial.
                const auto lhs_digits = (
                lhs == 0
                ? 1
                : std::ceil(std::log(lhs+1)/std::log(BASE))
                );

                const auto rhs_digits = (
                rhs == 0
                ? 1
                : std::ceil(std::log(rhs+1)/std::log(BASE))
                );

                // Now we loop through the positions, left-to-right,
              // calculating the digit at these positions for each
              // input, and comparing them numerically. The
              // lexicographic nature of the sorting comes from the
              // fact that we are doing this per-digit comparison
              // rather than considering the input value as a whole.
              const auto max_pos = std::max(lhs_digits, rhs_digits);
              for (auto pos = 0; pos < max_pos; pos++) {
                if (lhs_digits - pos == 0) {
                  // Ran out of digits on the LHS;
                  // prioritise the shorter input
                  return LHS_FIRST;
                }
                else if (rhs_digits - pos == 0) {
                  // Ran out of digits on the RHS;
                  // prioritise the shorter input
                  return RHS_FIRST;
                }
                else {
                  const auto lhs_x = (lhs / static_cast<decltype(BASE)>(std::pow(BASE, lhs_digits - 1 - pos))) % BASE;
                  const auto rhs_x = (rhs / static_cast<decltype(BASE)>(std::pow(BASE, rhs_digits - 1 - pos))) % BASE;

                  if (lhs_x < rhs_x)
                    return LHS_FIRST;
                  else if (rhs_x < lhs_x)
                    return RHS_FIRST;
                }
              }

              // If we reached the end and everything still
              // matches up, then something probably went wrong
              // as I'd have expected to catch this in the tests
              // for equality.
              assert("Unknown case encountered");
              }
              );

    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = end - start;
    std::cout << "Lightness: " << elapsed.count() << '\n';
  }

  {
    auto start = std::chrono::high_resolution_clock::now();

    auto max = *std::max_element(begin(dyp), end(dyp));
    int maxDigits = std::log10(max);

    std::vector<std::pair<int,int>> temp;
    temp.reserve(dyp.size());
    for(auto const& e : dyp) temp.push_back( lexicographic_pair_helper(e, maxDigits) );

    std::sort(begin(temp), end(temp), [](std::pair<int, int> const& l, std::pair<int, int> const& r)
    { if(l.first < r.first) return true; if(l.first > r.first) return false; return l.second < r.second; });

    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = end - start;
    std::cout << "Dyp: " << elapsed.count() << '\n';
  }

  {
    auto start = std::chrono::high_resolution_clock::now();
    std::sort (nim.begin(), nim.end(), l_comp);
    auto end = std::chrono::high_resolution_clock::now();
    auto elapsed = end - start;
    std::cout << "Nim: " << elapsed.count() << '\n';
  }

//   {
//     auto start = std::chrono::high_resolution_clock::now();
//     std::sort (nb.begin(), nb.end(), l_comp2);
//     auto end = std::chrono::high_resolution_clock::now();
//     auto elapsed = end - start;
//     std::cout << "notbad: " << elapsed.count() << '\n';
//   }

  std::cout << (nim == lo) << std::endl;
  std::cout << (nim == dyp) << std::endl;
  std::cout << (lo == dyp) << std::endl;
//   std::cout << (lo == nb) << std::endl;
}

Based on @Oswald's answer, below is some code that does the same. 根据@Oswald的回答,下面是一些相同的代码。

#include <iostream>
#include <vector>
#include <algorithm> 
using namespace std;

bool compare(string a, string b){
    // Check each digit
    int i = 0, j = 0;
    while(i < a.size() && j < b.size()){
        // If different digits
        if(a[i] - '0' != b[j] - '0')
            return (a[i] - '0' < b[j] - '0');
        i++, j++;
    }
    // Different sizes
    return (a.size() < b.size());
}

int main(){
    vector<string> array = {"1","2","3","4","5","6","7","8","9","10","11","12"};
    sort(array.begin(), array.end(), compare);

    for(auto value : array)
        cout << value << " ";
    return 0;
}

Input: 1 2 3 4 5 6 7 8 9 10 11 12 输入:1 2 3 4 5 6 7 8 9 10 11 12

Output: 1 10 11 12 2 3 4 5 6 7 8 9 输出:1 10 11 12 2 3 4 5 6 7 8 9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM