简体   繁体   English

“版本排序”的C++比较函数?

[英]C++ comparison function for "version sort"?

I see Compare two string as numeric value and I can look at the source for the sort command-line utility.我看到比较两个字符串作为数值,我可以查看sort命令行实用程序的源代码。 But to my surprise, I haven't found in Boost or StackOverflow a reference implementation of a generic string comparison of what man sort calls --version-sort – sorting strings with embedded numbers.但令我惊讶的是,我没有在 Boost 或 StackOverflow 中找到对man sort调用的通用字符串比较的参考实现--version-sort – 用嵌入的数字对字符串进行排序。 I understand there are locale issues if we want to do full numeric sort with signs, separators, and decimals, but version sort seems much easier (and addresses the common case sequential file names that aren't zero-padded).我知道如果我们想要使用符号、分隔符和小数进行全数字排序,则存在语言环境问题,但版本排序似乎要容易得多(并且解决了不以零填充的常见情况顺序文件名)。 Is there a common implementation of something like this?有没有这样的通用实现?

If not, A naive generic implementation supporting just unsigned integers seems like it would be如果没有,一个只支持无符号整数的朴素通用实现似乎是

  1. Both empty => equal.两者都为空 => 相等。
  2. Find the first digits in each range.查找每个范围中的第一个数字。
  3. std::lexicographical_compare_three_way the corresponding leading ranges. std::lexicographical_compare_three_way相应的领先范围。
  4. If they aren't equal, we are done.如果它们不相等,我们就完成了。
  5. If they are equal, find the end of the digit spans.如果它们相等,则找到数字跨度的末尾。
  6. Compare the corresponding digit spans numerically.以数字方式比较相应的数字跨度。
  7. Recursively compare the remainder of the ranges.递归比较剩余的范围。

Where the digit-span portion is something like数字跨度部分类似于

#include <algorithm>
#include <cassert>
#include <cctype> // For std::isdigit
#include <string_view>


struct is_digit_t {
    // See https://en.cppreference.com/w/cpp/string/byte/isdigit
    [[nodiscard]] bool operator()(unsigned char c) const {
         return std::isdigit(c);
    }
    [[nodiscard]] bool operator()(const std::integral auto& c) const {
        if (c != static_cast<std::decay_t<decltype(c)>>(static_cast<unsigned char>(c))) {
            return false; // Has to be itself as an unsigned char.
        }
        return (*this)(static_cast<unsigned char>(c));
    }
};

template <typename It0, typename It1, typename IsDigit = is_digit_t>
std::strong_ordering compare_strings_containing_only_digits_three_way(
    It0 b0, It0 e0, 
    It1 b1, It1 e1,
    [[maybe_unused]] IsDigit is_digit = {})
{
    assert(std::all_of(b0, e0, is_digit));
    assert(std::all_of(b1, e1, is_digit));    
    // Skip leading zeros:
    auto nonzero = [](const auto& c) { return c != '0'; };
    const auto bnz0 = std::find_if_not(b0, e0, nonzero);
    const auto bnz1 = std::find_if_not(b1, e1, nonzero);
    const auto s0 = std::distance(bnz0, e0);
    const auto s1 = std::distance(bnz1, e1);
    if (s0 != s1) {
        return s0 < s1 ? std::strong_ordering::less : std::strong_ordering::greater;
    }
    // Same number of digits => lexicographical compare is numerical compare:
    const auto numerical = std::lexicographical_compare_three_way(b0, e0, b1, e1);
    if (numerical != 0) {
        return numerical;
    }
    // Tiebreaker: "1" < "01" < "001" < "2":
    return std::lexicographical_compare_three_way(b0, bnz0, b1, bnz1);
}

template <typename It0, typename It1, typename IsDigit = is_digit_t>
std::strong_ordering compare_strings_containing_unsigned_integers_three_way(
    It0 b0, It0 e0, 
    It1 b1, It1 e1,
    IsDigit is_digit = {}) 
{
    if (b0 == e0 && b1 == e1) { // Totally empty => equal.
        return std::strong_ordering::equal;
    }
    const auto numStart0 = std::find_if(b0, e0, is_digit);
    const auto numStart1 = std::find_if(b1, e1, is_digit);

    
    if (const auto leadingCmp = std::lexicographical_compare_three_way(b0, numStart0, b1, numStart1);
        leadingCmp != std::strong_ordering::equal) {
        return leadingCmp; // Don't have to worry about the numbers.
    }
    const auto numEnd0 = std::find_if_not(numStart0, e0, is_digit);
    const auto numEnd1 = std::find_if_not(numStart1, e1, is_digit);
    if (auto numberCmp = compare_strings_containing_only_digits_three_way(numStart0, numEnd0, numStart1, numEnd1, is_digit); 
        numberCmp != std::strong_ordering::equal) {
        return numberCmp; // Number-part tie-broke it.
    }
    // Recursively deal with everything after the first matched numbers:
    return compare_strings_containing_unsigned_integers_three_way(numEnd0, e0, numEnd1, e1);
}

https://godbolt.org/z/eEKbMj9vj https://godbolt.org/z/eEKbMj9vj

So...所以...

  • Does this seem right?这看起来对吗?
  • Is there a cleaner way?有更干净的方法吗?
  • Are there really no Boost algorithms for this?真的没有 Boost 算法吗?

std::sort requires O(N·log(N)) comparisons. std::sort需要O(N·log(N))比较。 Compare that to O(N) transforming each string individually to a number that can be compared via built-in < .将其与O(N)将每个字符串单独转换为可以通过内置<进行比较的数字进行比较。

Instead of embedding the transformation in the comparison you should consider to populate a std::vector< std::pair<int, std::string>> and sort that via a simple std::sort(v.begin(),v.end()) and eventually extract the original strings to get the desired output.您应该考虑填充std::vector< std::pair<int, std::string>>并通过简单的std::sort(v.begin(),v.end())排序,而不是在比较中嵌入转换std::sort(v.begin(),v.end())并最终提取原始字符串以获得所需的输出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM