简体   繁体   English

散列 C++ 中的原始字节?

[英]Hashing raw bytes in C++?

I want to write a function that takes two types T , U such that sizeof(T)+sizeof(U)<=8 and gets a uint64_t by just reinterpreting their bytes one after the other.我想写一个 function ,它采用两种类型TU这样sizeof(T)+sizeof(U)<=8并通过一个接一个地重新解释它们的字节来获得uint64_t However this does not seem to work.但是,这似乎不起作用。 I am certain there is a quicker and more elegant (and correct) way to do it but I have no clue.我确信有一种更快、更优雅(和正确)的方法可以做到这一点,但我不知道。 Any tips are greatly appreciated.非常感谢任何提示。

#include <cstdint>
#include <iostream>
#include <vector>

template <typename T, typename U>
constexpr auto hash8(T x, U y) {
  static_assert(sizeof(T) + sizeof(U) <= 8);

  uint64_t u = 0;
  uint64_t v = 0;
  auto px = (uint8_t*)&x;
  auto py = (uint8_t*)&y;
  for (auto i = 0; i < sizeof(T); ++i) {
    u |= (uint64_t)px[i];
    u <<= 8;
  }
  for (auto i = 0; i < sizeof(U); ++i) {
    v |= (uint64_t)py[i];
    v <<= 8;
  }

  return u << (sizeof(U) * 8) | v;
}

int main() {
  std::cout << hash8(131, 0) << '\n';
  std::cout << hash8(132, 0) << '\n';
  std::cout << hash8(500, 0) << '\n';
}

The easiest way is usually to do a memcpy :最简单的方法通常是执行memcpy

#include <cstdint>
#include <cstring> // for memcpy

template <typename T, typename U>
auto hash8(T x, U y) {
  static_assert(sizeof(T) + sizeof(U) <= 8);

  uint64_t u = 0;
  char* u_ptr = reinterpret_cast<char*>(&u);
  std::memcpy(u_ptr, &x, sizeof x);
  std::memcpy(u_ptr+sizeof x, &y, sizeof y);
  return u;
}

Any decent compiler will inline the memcpy call to a few bit operations, if the size parameter is known at compile time (and reasonably small).如果大小参数在编译时已知(并且相当小),任何体面的编译器都会将memcpy调用内联到一些位操作。

If you actually need a constexpr function you can try using std::bit_cast from C++20 (maybe difficult if either input parameter does not have a size of 1, 2, 4, or 8).如果您确实需要constexpr function,您可以尝试使用 C++20 中的std::bit_cast (如果任一输入参数的大小不是 1、2、4 或 8,则可能会很困难)。

I cannot help with the problem in your code due to lack of details, but I can propose a perhaps simpler solution.由于缺乏细节,我无法解决您代码中的问题,但我可以提出一个可能更简单的解决方案。

Firstly, I recommend adding a check that the argument objects have unique object representation.首先,我建议添加检查参数对象是否具有唯一的 object 表示。 Unless that is satisfied, the hash would be meaningless.除非满足,否则 hash 将毫无意义。

Secondly, std::memcpy might make this simpler:其次, std::memcpy可能会使这更简单:

template <typename T, typename U>
auto
hash8(T x, U y) noexcept {
    static_assert(sizeof x + sizeof y <= sizeof(std::uint64_t));
    static_assert(std::has_unique_object_representations_v<T>);
    static_assert(std::has_unique_object_representations_v<U>);
    std::uint64_t ret{};
    auto ptr = reinterpret_cast<unsigned char*>(&ret);
    std::memcpy(ptr, std::addressof(x), sizeof x);
    ptr += sizeof x;
    std::memcpy(ptr, std::addressof(y), sizeof y);
    return ret;
}

Next, we can generalise this to arbitrary number of arguments (so long as they fit), and different return types:接下来,我们可以将其推广到任意数量的 arguments(只要它们适合)和不同的返回类型:

template <typename R = std::uint64_t, typename... Args>
auto
hash(Args... args) noexcept {
    static_assert((sizeof args + ...) <= sizeof(R));
    static_assert((std::has_unique_object_representations_v<Args> && ...));
    static_assert(std::has_unique_object_representations_v<R>);
    R ret{};
    auto ptr = reinterpret_cast<unsigned char*>(&ret);
    (
        (
            std::memcpy(ptr, std::addressof(args), sizeof args),
            ptr += sizeof args
        ), ...
    );
    return ret;
}

There is a caveat that a hash such as this is not the same across different systems, even if the sizes of the objects match.需要注意的是,即使对象的大小匹配,这样的 hash 在不同的系统中也不相同。

PS It's pointless to make your function constexpr because you use reinterpret casting which isn't allowed in constant expressions. PS 使您的 function constexpr 毫无意义,因为您使用常量表达式中不允许的重新解释转换。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM