简体   繁体   中英

How to compare generic structs in C++?

I want to compare structs in a generic way and I've done something like this (I cannot share the actual source, so ask for more details if necessary):

template<typename Data>
bool structCmp(Data data1, Data data2)
{
  void* dataStart1 = (std::uint8_t*)&data1;
  void* dataStart2 = (std::uint8_t*)&data2;
  return memcmp(dataStart1, dataStart2, sizeof(Data)) == 0;
}

This mostly works as intended, except sometimes it returns false even though the two struct instances have identical members (I've checked with eclipse debugger). After some searching I discovered that memcmp can fail due to the struct used being padded.

Is there a more proper way of comparing memory that's indifferent to padding? I'm not able to modify the structs used (they're part of an API I'm using) and the many different structs used has some differing members and thus cannot be compared individually in a generic way (to my knowledge).

Edit: I'm unfortunately stuck with C++11. Should've mentioned this earlier...

You are right that padding gets in your way of comparing arbitrary types in this way.

There are measures you can take:

  • If you are in control of Data then eg gcc has __attribute__((packed)) . It has impact on performance, but it might be worth to give it a try. Though, I have to admit that I dont know if packed enables you to disallow padding completely. Gcc doc says:

This attribute, attached to struct or union type definition, specifies that each member of the structure or union is placed to minimize the memory required. When attached to an enum definition, it indicates that the smallest integral type should be used.

If T is TriviallyCopyable and if any two objects of type T with the same value have the same object representation, provides the member constant value equal true. For any other type, value is false.

and further:

This trait was introduced to make it possible to determine whether a type can be correctly hashed by hashing its object representation as a byte array.

PS: I only addressed padding, but dont forget that types that can compare equal for instances with different representation in memory are by no means rare (eg std::string , std::vector and many others).

No, memcmp is not suitable to do this. And reflection in C++ is insufficient to do this at this point (there are going to be experimental compilers that support reflection strong enough to do this already, and might have the features you need).

Without built-in reflection, the easiest way to solve your problem is to do some manual reflection.

Take this:

struct some_struct {
  int x;
  double d1, d2;
  char c;
};

we want to do the minimal amount of work so we can compare two of these.

If we have:

auto as_tie(some_struct const& s){ 
  return std::tie( s.x, s.d1, s.d2, s.c );
}

or

auto as_tie(some_struct const& s)
-> decltype(std::tie( s.x, s.d1, s.d2, s.c ))
{
  return std::tie( s.x, s.d1, s.d2, s.c );
}

for , then:

template<class S>
bool are_equal( S const& lhs, S const& rhs ) {
  return as_tie(lhs) == as_tie(rhs);
}

does a pretty decent job.

We can extend this process to be recursive with a bit of work; instead of comparing ties, compare each element wrapped in a template, and that template's operator== recursively applies this rule (wrapping the element in as_tie to compare) unless the element already has a working == , and handles arrays.

This will require a bit of a library (100ish lines of code?) together with writing a bit of manual per-member "reflection" data. If the number of structs you have is limited, it might be easier to write per-struct code manually.


There are probably ways to get

REFLECT( some_struct, x, d1, d2, c )

to generate the as_tie structure using horrible macros. But as_tie is simple enough. In the repetition is annoying; this is useful:

#define RETURNS(...) \
  noexcept(noexcept(__VA_ARGS__)) \
  -> decltype(__VA_ARGS__) \
  { return __VA_ARGS__; }

in this situation and many others. With RETURNS , writing as_tie is:

auto as_tie(some_struct const& s)
  RETURNS( std::tie( s.x, s.d1, s.d2, s.c ) )

removing the repetition.


Here is a stab at making it recursive:

template<class T,
  typename std::enable_if< !std::is_class<T>{}, bool>::type = true
>
auto refl_tie( T const& t )
  RETURNS(std::tie(t))

template<class...Ts,
  typename std::enable_if< (sizeof...(Ts) > 1), bool>::type = true
>
auto refl_tie( Ts const&... ts )
  RETURNS(std::make_tuple(refl_tie(ts)...))

template<class T, std::size_t N>
auto refl_tie( T const(&t)[N] ) {
  // lots of work in C++11 to support this case, todo.
  // in C++17 I could just make a tie of each of the N elements of the array?

  // in C++11 I might write a custom struct that supports an array
  // reference/pointer of fixed size and implements =, ==, !=, <, etc.
}

struct foo {
  int x;
};
struct bar {
  foo f1, f2;
};
auto refl_tie( foo const& s )
  RETURNS( refl_tie( s.x ) )
auto refl_tie( bar const& s )
  RETURNS( refl_tie( s.f1, s.f2 ) )

refl_tie(array) (fully recursive, even supports arrays-of-arrays):

template<class T, std::size_t N, std::size_t...Is>
auto array_refl( T const(&t)[N], std::index_sequence<Is...> )
  RETURNS( std::array<decltype( refl_tie(t[0]) ), N>{ refl_tie( t[Is] )... } )

template<class T, std::size_t N>
auto refl_tie( T(&t)[N] )
  RETURNS( array_refl( t, std::make_index_sequence<N>{} ) )

Live example .

Here I use a std::array of refl_tie . This is much faster than my previous tuple of refl_tie at compile time.

Also

template<class T,
  typename std::enable_if< !std::is_class<T>{}, bool>::type = true
>
auto refl_tie( T const& t )
  RETURNS(std::cref(t))

using std::cref here instead of std::tie could save on compile-time overhead, as cref is a much simpler class than tuple .

Finally, you should add

template<class T, std::size_t N, class...Ts>
auto refl_tie( T(&t)[N], Ts&&... ) = delete;

which will prevent array members from decaying to pointers and falling back on pointer-equality (which you probably don't want from arrays).

Without this, if you pass an array to a non-reflected struct in, it falls back on pointer-to-non-reflected struct refl_tie , which works and returns nonsense.

With this, you end up with a compile-time error.


Support for recursion through library types is tricky. You could std::tie them:

template<class T, class A>
auto refl_tie( std::vector<T, A> const& v )
  RETURNS( std::tie(v) )

but that doesn't support recursion through it.

In short: Not possible in a generic way.

The problem with memcmp is that the padding may contain arbitrary data and hence the memcmp may fail. If there were a way to find out where the padding is, you could zero-out those bits and then compare the data representations, that would check for equality if the members are trivially comparable (which is not the case ie for std::string since two strings can contain different pointers, but the pointed two char-arrays are equal). But I know of no way to get at the padding of structs. You can try to tell your compiler to pack the structs, but this will make accesses slower and is not really guranteed to work.

The cleanest way to implement this is to compare all members. Of course this is not really possible in a generic way (until we get compile time reflections and meta classes in C++23 or later). From C++20 onward, one could generate a default operator<=> but I think this would also only be possible as a member function so, again this is not really applicable. If you are lucky and all structs you want to compare have an operator== defined, you can of course just use that. But that is not guaranteed.

EDIT: Ok, there is actually a totally hacky and somewhat generic way for aggregates. (I only wrote the conversion to tuples, those have a default comparison operator). godbolt

C++ 20 supports default comaparisons

#include <iostream>
#include <compare>

struct XYZ
{
    int x;
    char y;
    long z;

    auto operator<=>(const XYZ&) const = default;
};

int main()
{
    XYZ obj1 = {4,5,6};
    XYZ obj2 = {4,5,6};

    if (obj1 == obj2)
    {
        std::cout << "objects are identical\n";
    }
    else
    {
        std::cout << "objects are not identical\n";
    }
    return 0;
}

Assuming POD data, default assignment operator copies only member bytes. (actually not 100% sure about that, don't take my word for it)

You can use this to your advantage:

template<typename Data>
bool structCmp(Data data1, Data data2) // Data is POD
{
  Data tmp;
  memcpy(&tmp, &data1, sizeof(Data)); // copy data1 including padding
  tmp = data2;                        // copy data2 only members
  return memcmp(&tmp, &data1, sizeof(Data)) == 0; 
}

I believe you may be able to base a solution on Antony Polukhin's wonderfully devious voodoo in the magic_get library - for structs, not for complex classes.

With that library, we are able to iterate the different fields of a struct, with their appropriate type, in purely-general-templated code. Antony has used this, for example, to be able to stream arbitrary structs to an output stream with the correct types, completely generically. It stands to reason that comparison might also be a possible application of this approach.

... but you would need C++14. At least it's better than the C++17 and later suggestions in other answers :-P

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM