简体   繁体   中英

More silent behaviour changes with C++20 three-way comparison

To my surprise, I ran into another snag like C++20 behaviour breaking existing code with equality operator? .

Consider a simple case-insensitive key type, to be used with, eg, std::set or std::map :

// Represents case insensitive keys
struct CiKey : std::string {
    using std::string::string;
    using std::string::operator=;

    bool operator<(CiKey const& other) const {
        return boost::ilexicographical_compare(*this, other);
    }
};

Simple tests:

using KeySet   = std::set<CiKey>;
using Mapping  = std::pair<CiKey, int>; // Same with std::tuple
using Mappings = std::set<Mapping>;

int main()
{
    KeySet keys { "one", "two", "ONE", "three" };
    Mappings mappings {
        { "one", 1 }, { "two", 2 }, { "ONE", 1 }, { "three", 3 }
    };

    assert(keys.size() == 3);
    assert(mappings.size() == 3);
}
  • Using C++17, both asserts pass ( Compiler Explorer ).

  • Switching to C++20, the second assert fails ( Compiler Explorer )

    output.s: ./example.cpp:28: int main(): Assertion `mappings.size() == 3' failed.


Obvious Workaround

An obvious work-around is to conditionally supply operator<=> in C++20 mode: Compile Explorer

#if defined(__cpp_lib_three_way_comparison)
    std::weak_ordering operator<=>(CiKey const& other) const {
        if (boost::ilexicographical_compare(*this, other)) {
            return std::weak_ordering::less;
        } else if (boost::ilexicographical_compare(other, *this)) {
            return std::weak_ordering::less;
        }
        return std::weak_ordering::equivalent;
    }
#endif

Question

It surprises me that I ran into another case of breaking changes - where C++20 changes behaviour of code without diagnostic.

On my reading of std::tuple::operator< it should have worked:

3-6) Compares lhs and rhs lexicographically by operator< , that is, compares the first elements, if they are equivalent, compares the second elements, if those are equivalent, compares the third elements, and so on. For non-empty tuples, (3) is equivalent to

if (std::get<0>(lhs) < std::get<0>(rhs)) return true; if (std::get<0>(rhs) < std::get<0>(lhs)) return false; if (std::get<1>(lhs) < std::get<1>(rhs)) return true; if (std::get<1>(rhs) < std::get<1>(lhs)) return false; ... return std::get<N - 1>(lhs) < std::get<N - 1>(rhs);

I understand that technically these don't apply since C++20, and it gets replaced by:

Compares lhs and rhs lexicographically by synthesized three-way comparison (see below), that is, compares the first elements, if they are equivalent, compares the second elements, if those are equivalent, compares the third elements, and so on

Together with

The <, <=, >, >=, and != operators are synthesized from operator<=> and operator== respectively. (since C++20)

The thing is,

  • my type doesn't define operator<=> nor operator== ,

  • and as this answer points out providing operator< in addition would be fine and should be used when evaluating simple expressions like a < b .

  1. Is the behavior change in C++20 correct/on purpose?
  2. Should there be a diagnostic?
  3. Can we use other tools to spot silent breakage like this? It feels like scanning entire code-bases for usage of user-defined types in tuple / pair doesn't scale well.
  4. Are there other types, beside tuple / pair that could manifest similar changes?

The basic problem comes from the facts that your type is incoherent and the standard library didn't call you on it until C++20. That is, your type was always kind of broken, but things were narrowly enough defined that you could get away with it.

Your type is broken because its comparison operators make no sense. It advertises that it is fully comparable, with all of the available comparison operators defined. This happens because you publicly inherited from std::string , so your type inherits those operators by implicit conversion to the base class. But the behavior of this slate of comparisons is incorrect because you replaced only one of them with a comparison that doesn't work like the rest.

And since the behavior is inconsistent, what could happen is up for grabs once C++ actually cares about you being consistent.

A larger problem however is an inconsistency with how the standard treats operator<=> .

The C++ language is designed to give priority to explicitly defined comparison operators before employing synthesized operators. So your type inherited from std::string will use your operator< if you compare them directly.

C++ the library however sometimes tries to be clever.

Some types attempt to forward the operators provided by a given type, like optional<T> . It is designed to behave identically to T in its comparability, and it succeeds at this.

However, pair and tuple try to be a bit clever. In C++17, these types never actually forwarded comparison behavior; instead, it synthesized comparison behavior based on existing operator< and operator== definitions on the types.

So it's no surprise that their C++20 incarnations continue that fine tradition of synthesizing comparisons. Of course, since the language got in on that game, the C++20 versions decided that it was best to just follow their rules.

Except... it couldn't follow them exactly . There's no way to detect whether a < comparison is synthesized or user-provided. So there's no way to implement the language behavior in one of these types. However, you can detect the presence of three-way comparison behavior.

So they make an assumption: if your type is three-way comparable, then your type is relying on synthesized operators (if it isn't, it uses an improved form of the old method). Which is the right assumption; after all, since <=> is a new feature, old types can't possibly get one.

Unless of course an old type inherits from a new type that gained three-way comparability. And there's no way for a type to detect that either; it either is three-way comparable or it isn't.

Now fortunately, the synthesized three-way comparison operators of pair and tuple are perfectly capable of mimicking the C++17 behavior if your type doesn't offer three-way comparison functionality. So you can get back the old behavior by explicitly dis-inheriting the three-way comparison operator in C++20 by deleting the operator<=> overload.

Alternatively, you could use private inheritance and simply publicly using the specific APIs you wanted.

Is the behavior change in c++20 correct/on purpose?

That depends on what you mean by "on purpose".

Publicly inheriting from types like std::string has always been somewhat morally dubious. Not so much because of the slicing/destructor problem, but more because it is kind of a cheat. Inheriting such types directly opens you up to changes in the API that you didn't expect and may not be appropriate for your type.

The new comparison version of pair and tuple are doing their jobs and doing them as best as C++ can permit. It's just that your type inherited something it didn't want. If you had privately inherited from std::string and only using -exposed the functionality you wanted, your type would likely be fine.

Should there be a diagnostic?

This can't be diagnosed outside of some compiler-intrinsic.

Can we use other tools to spot silent breakage like this?

Search for case where you're publicly inheriting from standard library types.

Ah! @StoryTeller nailed it with their comment :

"my type doesn't define operator<=> nor operator==" - but std::string does, making it a candidate due to the d[e]rived-to-base conversion. I believe all standard library types that support comparison had their members overhauled.

Indeed, a much quicker work-around is:

#if defined(__cpp_lib_three_way_comparison)
    std::weak_ordering operator<=>(
        CiKey const&) const = delete;
#endif

Success! Compiler Explorer

Better Ideas

Better solution, as hinted by StoryTeller's second comment :

I guess non-virtual destructors are no longer the sole compelling reason to avoid inheriting from standard library containers:/

Would be to avoid inheritance here:

// represents case insensiive keys
struct CiKey {
    std::string _value;

    bool operator<(CiKey const& other) const {
        return boost::ilexicographical_compare(_value, other._value);
    }
};

Of course this requires (some) downstream changes to the using code, but it's conceptually purer and insulates against this type of "standard creep" in the future.

Compiler Explorer

#include <boost/algorithm/string.hpp>
#include <iostream>
#include <set>
#include <version>

// represents case insensiive keys
struct CiKey {
    std::string _value;

    bool operator<(CiKey const& other) const {
        return boost::ilexicographical_compare(_value, other._value);
    }
};

using KeySet   = std::set<CiKey>;
using Mapping  = std::tuple<CiKey, int>;
using Mappings = std::set<Mapping>;

int main()
{
    KeySet keys { { "one" }, { "two" }, { "ONE" }, { "three" } };
    Mappings mappings { { { "one" }, 1 }, { { "two" }, 2 }, { { "ONE" }, 1 },
        { { "three" }, 3 } };

    assert(keys.size() == 3);
    assert(mappings.size() == 3);
}

Remaining Questions

How can we diagnose problems like these. They're so subtle they will escape code review. The situation is exacerbated by there being 2 decades of standard C++ where this worked perfectly fine and predictably.

I guess as a sidenote, we can expect any "lifted" operators (thinking of std::variant/std::optional) to have similar pitfalls when used with user-defined types that inherit too much from standard library types.

This is not really an answer on the different behaviors of std::string::operator=() , but I must point out that creating case insensitive strings should be done via customization template parameter Traits .

Example:

// definition of basic_string:
template<
    class CharT,
    class Traits = std::char_traits<CharT>,   // <- this is the customization point.
    class Allocator = std::allocator<CharT>
> class basic_string;

The example of case-insensitive string comes almost straight out from cppreference ( https://en.cppreference.com/w/cpp/string/char_traits ). I've added using directives for case-insensitive strings.

#include <cctype>
#include <cwctype>
#include <iostream>
#include <locale>
#include <string>
#include <version>

template <typename CharT> struct ci_traits : public std::char_traits<CharT>
{
    #ifdef __cpp_lib_constexpr_char_traits
    #define CICE constexpr
    #endif

private:
    using base = std::char_traits<CharT>;
    using int_type = typename base::int_type;

    static CICE CharT to_upper(CharT ch)
    {
        if constexpr (sizeof(CharT) == 1)
            return std::toupper(static_cast<unsigned char>(ch));
        else
            return std::toupper(CharT(ch & 0xFFFF), std::locale{});
    }

public:
    using base::to_int_type;
    using base::to_char_type;

    static CICE bool eq(CharT c1, CharT c2)
    {
        return to_upper(c1) == to_upper(c2);
    }
    static CICE bool lt(CharT c1, CharT c2)
    {
        return to_upper(c1) < to_upper(c2);
    }
    static CICE bool eq_int_type(const int_type& c1, const int_type& c2)
    {
        return to_upper(to_char_type(c1)) == to_upper(to_char_type(c2));
    }
    static CICE int compare(const CharT *s1, const CharT *s2, std::size_t n)
    {
        while (n-- != 0)
        {
            if (to_upper(*s1) < to_upper(*s2))
                return -1;
            if (to_upper(*s1) > to_upper(*s2))
                return 1;
            ++s1;
            ++s2;
        }
        return 0;
    }
    static CICE const CharT *find(const CharT *s, std::size_t n, CharT a)
    {
        auto const ua(to_upper(a));
        while (n-- != 0) {
            if (to_upper(*s) == ua)
                return s;
            s++;
        }
        return nullptr;
    }
    #undef CICE
};

using ci_string = std::basic_string<char, ci_traits<char>>;
using ci_wstring = std::basic_string<wchar_t, ci_traits<wchar_t>>;

// TODO consider constexpr support
template <typename CharT, typename Alloc>
inline std::basic_string<CharT, std::char_traits<CharT>, Alloc> string_cast(
    const std::basic_string<CharT, ci_traits<CharT>, Alloc> &src)
{
    return std::basic_string<CharT, std::char_traits<CharT>, Alloc>{
        src.begin(), src.end(), src.get_allocator()};
}

template <typename CharT, typename Alloc>
inline std::basic_string<CharT, ci_traits<CharT>, Alloc> ci_string_cast(
    const std::basic_string<CharT, std::char_traits<CharT>, Alloc> &src)
{
    return std::basic_string<CharT, ci_traits<CharT>>{src.begin(), src.end(),
                                                    src.get_allocator()};
}

int main(int argc, char**) {
    if (argc<=1)
    {
        std::cout << "char\n";
        ci_string hello = "hello";
        ci_string Hello = "Hello";

        // convert a ci_string to a std::string
        std::string x = string_cast(hello);

        // convert a std::string to a ci_string
        auto ci_hello = ci_string_cast(x);

        if (hello == Hello)
            std::cout << string_cast(hello) << " and " << string_cast(Hello)
                    << " are equal\n";

        if (hello == "HELLO")
            std::cout << string_cast(hello) << " and "
                    << "HELLO"
                    << " are equal\n";
    }
    else
    {
        std::cout << "wchar_t\n";
        ci_wstring hello = L"hello";
        ci_wstring Hello = L"Hello";

        // convert a ci_wstring to a std::wstring
        std::wstring x = string_cast(hello);

        // convert a std::wstring to a ci_wstring
        auto ci_hello = ci_string_cast(x);

        if (hello == Hello)
            std::wcout << string_cast(hello) << L" and " << string_cast(Hello) << L" are equal\n";

        if (hello == L"HELLO")
            std::wcout << string_cast(hello) << L" and " << L"HELLO" << L" are equal\n";
    }
}

You can play with it here: https://godbolt.org/z/5ec5sz

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM