简体   繁体   中英

Hashing raw bytes in C++?

I want to write a function that takes two types T , U such that sizeof(T)+sizeof(U)<=8 and gets a uint64_t by just reinterpreting their bytes one after the other. However this does not seem to work. I am certain there is a quicker and more elegant (and correct) way to do it but I have no clue. Any tips are greatly appreciated.

#include <cstdint>
#include <iostream>
#include <vector>

template <typename T, typename U>
constexpr auto hash8(T x, U y) {
  static_assert(sizeof(T) + sizeof(U) <= 8);

  uint64_t u = 0;
  uint64_t v = 0;
  auto px = (uint8_t*)&x;
  auto py = (uint8_t*)&y;
  for (auto i = 0; i < sizeof(T); ++i) {
    u |= (uint64_t)px[i];
    u <<= 8;
  }
  for (auto i = 0; i < sizeof(U); ++i) {
    v |= (uint64_t)py[i];
    v <<= 8;
  }

  return u << (sizeof(U) * 8) | v;
}

int main() {
  std::cout << hash8(131, 0) << '\n';
  std::cout << hash8(132, 0) << '\n';
  std::cout << hash8(500, 0) << '\n';
}

The easiest way is usually to do a memcpy :

#include <cstdint>
#include <cstring> // for memcpy

template <typename T, typename U>
auto hash8(T x, U y) {
  static_assert(sizeof(T) + sizeof(U) <= 8);

  uint64_t u = 0;
  char* u_ptr = reinterpret_cast<char*>(&u);
  std::memcpy(u_ptr, &x, sizeof x);
  std::memcpy(u_ptr+sizeof x, &y, sizeof y);
  return u;
}

Any decent compiler will inline the memcpy call to a few bit operations, if the size parameter is known at compile time (and reasonably small).

If you actually need a constexpr function you can try using std::bit_cast from C++20 (maybe difficult if either input parameter does not have a size of 1, 2, 4, or 8).

I cannot help with the problem in your code due to lack of details, but I can propose a perhaps simpler solution.

Firstly, I recommend adding a check that the argument objects have unique object representation. Unless that is satisfied, the hash would be meaningless.

Secondly, std::memcpy might make this simpler:

template <typename T, typename U>
auto
hash8(T x, U y) noexcept {
    static_assert(sizeof x + sizeof y <= sizeof(std::uint64_t));
    static_assert(std::has_unique_object_representations_v<T>);
    static_assert(std::has_unique_object_representations_v<U>);
    std::uint64_t ret{};
    auto ptr = reinterpret_cast<unsigned char*>(&ret);
    std::memcpy(ptr, std::addressof(x), sizeof x);
    ptr += sizeof x;
    std::memcpy(ptr, std::addressof(y), sizeof y);
    return ret;
}

Next, we can generalise this to arbitrary number of arguments (so long as they fit), and different return types:

template <typename R = std::uint64_t, typename... Args>
auto
hash(Args... args) noexcept {
    static_assert((sizeof args + ...) <= sizeof(R));
    static_assert((std::has_unique_object_representations_v<Args> && ...));
    static_assert(std::has_unique_object_representations_v<R>);
    R ret{};
    auto ptr = reinterpret_cast<unsigned char*>(&ret);
    (
        (
            std::memcpy(ptr, std::addressof(args), sizeof args),
            ptr += sizeof args
        ), ...
    );
    return ret;
}

There is a caveat that a hash such as this is not the same across different systems, even if the sizes of the objects match.

PS It's pointless to make your function constexpr because you use reinterpret casting which isn't allowed in constant expressions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM