[英]C++ comparison function for "version sort"?
I see Compare two string as numeric value and I can look at the source for the sort
command-line utility.我看到比较两个字符串作为数值,我可以查看
sort
命令行实用程序的源代码。 But to my surprise, I haven't found in Boost or StackOverflow a reference implementation of a generic string comparison of what man sort
calls --version-sort
– sorting strings with embedded numbers.但令我惊讶的是,我没有在 Boost 或 StackOverflow 中找到对
man sort
调用的通用字符串比较的参考实现--version-sort
– 用嵌入的数字对字符串进行排序。 I understand there are locale issues if we want to do full numeric sort with signs, separators, and decimals, but version sort seems much easier (and addresses the common case sequential file names that aren't zero-padded).我知道如果我们想要使用符号、分隔符和小数进行全数字排序,则存在语言环境问题,但版本排序似乎要容易得多(并且解决了不以零填充的常见情况顺序文件名)。 Is there a common implementation of something like this?
有没有这样的通用实现?
If not, A naive generic implementation supporting just unsigned integers seems like it would be如果没有,一个只支持无符号整数的朴素通用实现似乎是
std::lexicographical_compare_three_way
the corresponding leading ranges. std::lexicographical_compare_three_way
相应的领先范围。 Where the digit-span portion is something like数字跨度部分类似于
#include <algorithm>
#include <cassert>
#include <cctype> // For std::isdigit
#include <string_view>
struct is_digit_t {
// See https://en.cppreference.com/w/cpp/string/byte/isdigit
[[nodiscard]] bool operator()(unsigned char c) const {
return std::isdigit(c);
}
[[nodiscard]] bool operator()(const std::integral auto& c) const {
if (c != static_cast<std::decay_t<decltype(c)>>(static_cast<unsigned char>(c))) {
return false; // Has to be itself as an unsigned char.
}
return (*this)(static_cast<unsigned char>(c));
}
};
template <typename It0, typename It1, typename IsDigit = is_digit_t>
std::strong_ordering compare_strings_containing_only_digits_three_way(
It0 b0, It0 e0,
It1 b1, It1 e1,
[[maybe_unused]] IsDigit is_digit = {})
{
assert(std::all_of(b0, e0, is_digit));
assert(std::all_of(b1, e1, is_digit));
// Skip leading zeros:
auto nonzero = [](const auto& c) { return c != '0'; };
const auto bnz0 = std::find_if_not(b0, e0, nonzero);
const auto bnz1 = std::find_if_not(b1, e1, nonzero);
const auto s0 = std::distance(bnz0, e0);
const auto s1 = std::distance(bnz1, e1);
if (s0 != s1) {
return s0 < s1 ? std::strong_ordering::less : std::strong_ordering::greater;
}
// Same number of digits => lexicographical compare is numerical compare:
const auto numerical = std::lexicographical_compare_three_way(b0, e0, b1, e1);
if (numerical != 0) {
return numerical;
}
// Tiebreaker: "1" < "01" < "001" < "2":
return std::lexicographical_compare_three_way(b0, bnz0, b1, bnz1);
}
template <typename It0, typename It1, typename IsDigit = is_digit_t>
std::strong_ordering compare_strings_containing_unsigned_integers_three_way(
It0 b0, It0 e0,
It1 b1, It1 e1,
IsDigit is_digit = {})
{
if (b0 == e0 && b1 == e1) { // Totally empty => equal.
return std::strong_ordering::equal;
}
const auto numStart0 = std::find_if(b0, e0, is_digit);
const auto numStart1 = std::find_if(b1, e1, is_digit);
if (const auto leadingCmp = std::lexicographical_compare_three_way(b0, numStart0, b1, numStart1);
leadingCmp != std::strong_ordering::equal) {
return leadingCmp; // Don't have to worry about the numbers.
}
const auto numEnd0 = std::find_if_not(numStart0, e0, is_digit);
const auto numEnd1 = std::find_if_not(numStart1, e1, is_digit);
if (auto numberCmp = compare_strings_containing_only_digits_three_way(numStart0, numEnd0, numStart1, numEnd1, is_digit);
numberCmp != std::strong_ordering::equal) {
return numberCmp; // Number-part tie-broke it.
}
// Recursively deal with everything after the first matched numbers:
return compare_strings_containing_unsigned_integers_three_way(numEnd0, e0, numEnd1, e1);
}
https://godbolt.org/z/eEKbMj9vj https://godbolt.org/z/eEKbMj9vj
So...所以...
std::sort
requires O(N·log(N))
comparisons. std::sort
需要O(N·log(N))
比较。 Compare that to O(N)
transforming each string individually to a number that can be compared via built-in <
.将其与
O(N)
将每个字符串单独转换为可以通过内置<
进行比较的数字进行比较。
Instead of embedding the transformation in the comparison you should consider to populate a std::vector< std::pair<int, std::string>>
and sort that via a simple std::sort(v.begin(),v.end())
and eventually extract the original strings to get the desired output.您应该考虑填充
std::vector< std::pair<int, std::string>>
并通过简单的std::sort(v.begin(),v.end())
排序,而不是在比较中嵌入转换std::sort(v.begin(),v.end())
并最终提取原始字符串以获得所需的输出。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.