在 C++20 中输入双关 uint64_t 作为两个 uint32_t

Question

This code to read a uint64_t as two uint32_t is UB due to the strict aliasing rule:由于严格的别名规则，这段将uint64_t读取为两个uint32_t代码是 UB：

uint64_t v;
uint32_t lower = reinterpret_cast<uint32_t*>(&v)[0];
uint32_t upper = reinterpret_cast<uint32_t*>(&v)[1];

Likewise, this code to write the upper and lower part of an uint64_t is UB due to the same reason:同样，这段写uint64_t上下部分的代码也是UB，原因相同：

uint64_t v;
uint32_t* lower = reinterpret_cast<uint32_t*>(&v);
uint32_t* upper = reinterpret_cast<uint32_t*>(&v) + 1;

*lower = 1;
*upper = 1;

How can one write this code in a safe and clean way in modern C++20, potentially using std::bit_cast ?如何在现代 C++20 中以安全和干净的方式编写此代码，可能使用std::bit_cast ？

Answer 1

Using std::bit_cast :使用std::bit_cast ：

Try it online!在线试试吧！

#include <bit>
#include <array>
#include <cstdint>
#include <iostream>

int main() {
    uint64_t x = 0x12345678'87654321ULL;
    // Convert one u64 -> two u32
    auto v = std::bit_cast<std::array<uint32_t, 2>>(x);
    std::cout << std::hex << v[0] << " " << v[1] << std::endl;
    // Convert two u32 -> one u64
    auto y = std::bit_cast<uint64_t>(v);
    std::cout << std::hex << y << std::endl;
}

Output:输出：

87654321 12345678
1234567887654321

std::bit_cast is available only in C++20. std::bit_cast仅在 C++20 中可用。 Prior to C++20 you can manually implement std::bit_cast through std::memcpy , with one exception that such implementation is not constexpr like C++20 variant:在 C++20 之前，您可以通过std::memcpy手动实现std::bit_cast ，但有一个例外，即此类实现不像 C++20 变体那样constexpr ：

template <class To, class From>
inline To bit_cast(From const & src) noexcept {
    //return std::bit_cast<To>(src);
    static_assert(std::is_trivially_constructible_v<To>,
        "Destination type should be trivially constructible");
    To dst;
    std::memcpy(&dst, &src, sizeof(To));
    return dst;
}

For this specific case of integers quite optimal would be just to do bit shift/or arithmetics to convert one u64 to two u32 and back again.对于整数的这种特定情况，非常理想的只是进行位移/或算术以将一个 u64 转换为两个 u32 并再次返回。 std::bit_cast is more generic, supporting any trivially constructible type, although std::bit_cast solution should be same optimal as bit arithmetics on modern compilers with high level of optimization. std::bit_cast更通用，支持任何可简单构造的类型，尽管 std::bit_cast 解决方案应该与具有高级优化的现代编译器上的位算术相同。

One extra profit of bit arithmetics is that it handles correctly endianess, unlike std::bit_cast.与 std::bit_cast 不同，位算术的一个额外好处是它可以正确处理字节序。

Try it online!在线试试吧！

#include <cstdint>
#include <iostream>

int main() {
    uint64_t x = 0x12345678'87654321ULL;
    // Convert one u64 -> two u32
    uint32_t lo = uint32_t(x), hi = uint32_t(x >> 32);
    std::cout << std::hex << lo << " " << hi << std::endl;
    // Convert two u32 -> one u64
    uint64_t y = (uint64_t(hi) << 32) | lo;
    std::cout << std::hex << y << std::endl;
}

Output:输出：

87654321 12345678
123456788765432

Answer 2

in a safe and clean way以安全和清洁的方式

Do not use reinterpret_cast.不要使用 reinterpret_cast。 Do not depend on unclear code that depends on some specific compiler settings and fishy, uncertain behavior.不要依赖于依赖于某些特定编译器设置和可疑的、不确定的行为的不清楚的代码。 Use exact arithmetic operations with well-known defined result.使用具有众所周知的定义结果的精确算术运算。 Classes and operator overloads are all there waiting for you.类和运算符重载都在等着你。 For example, some global functions:例如一些全局函数：

#include <iostream>

struct UpperUint64Ref {
   uint64_t &v;
   UpperUint64Ref(uint64_t &v) : v(v) {}
   UpperUint64Ref operator=(uint32_t a) {
      v &= 0x00000000ffffffffull;
      v |= (uint64_t)a << 32;
      return *this;
   }
   operator uint64_t() {
      return v;
   }
};
struct LowerUint64Ref { 
    uint64_t &v;
    LowerUint64Ref(uint64_t &v) : v(v) {}
    /* as above */
};
UpperUint64Ref upper(uint64_t& v) { return v; }
LowerUint64Ref lower(uint64_t& v) { return v; }

int main() {
   uint64_t v;
   upper(v) = 1;
}

Or interface object:或接口对象：

#include <iostream>

struct Uint64Ref {
   uint64_t &v;
   Uint64Ref(uint64_t &v) : v(v) {}
   struct UpperReference {
       uint64_t &v;
       UpperReference(uint64_t &v) : v(v) {}
       UpperReference operator=(uint32_t a) {
           v &= 0x00000000ffffffffull;
           v |= (uint64_t)a << 32u;
       }
   };
   UpperReference upper() {
      return v;
   }
   struct LowerReference {
       uint64_t &v;
       LowerReference(uint64_t &v) : v(v) {}
   };
   LowerReference lower() { return v; }
};
int main() {
   uint64_t v;
   Uint64Ref r{v};
   r.upper() = 1;
}

Answer 3

Using std::memcpy使用std::memcpy

#include <cstdint>
#include <cstring>

void foo(uint64_t& v, uint32_t low_val, uint32_t high_val) {
    std::memcpy(reinterpret_cast<unsigned char*>(&v), &low_val,
                sizeof(low_val));
    std::memcpy(reinterpret_cast<unsigned char*>(&v) + sizeof(low_val),
                &high_val, sizeof(high_val));
}

int main() {
    uint64_t v = 0;
    foo(v, 1, 2);
}

With O1 , the compiler reduces foo to:使用O1 ，编译器将foo为：

        mov     DWORD PTR [rdi], esi
        mov     DWORD PTR [rdi+4], edx
        ret

Meaning there are no extra copies made, std::memcpy just serves as a hint to the compiler.这意味着没有额外的副本， std::memcpy只是作为编译器的提示。

Answer 4

std::bit_cast alone is not enough, since results will vary by the endian of the system.单独的std::bit_cast是不够的，因为结果会因系统的字节序而异。

Fortunately <bit> also contains std::endian .幸运的是<bit>还包含std::endian 。

Keeping in mind that optimizers generally compile-time resolve if s that are always true or false, we can just test endianness and act accordingly.请记住，优化器通常会在编译时解析if总是为真或为假，我们可以只测试字节序并采取相应的行动。

We only know beforehand how to handle big or little endian.我们事先只知道如何处理大端或小端。 If it is not one of those, bit_cast results are not decodable.如果不是其中之一，则 bit_cast 结果不可解码。

Another factor that can spoil things is padding.另一个可以破坏事物的因素是填充。 Using bit_cast assumes 0 padding between array elements.使用 bit_cast 假设数组元素之间填充为 0。

So we can check if there is no padding and endianness is big or little to see if it is castable.所以我们可以检查是否没有填充和字节序是大还是小，看看它是否是可铸造的。

If it is not castable, we do a bunch of shifts as per the old method.如果它不可铸造，我们会按照旧方法进行大量转换。 (this can be slow) （这可能很慢）
If the endianness is big -- just return the results of bit_cast.如果字节序big ——只返回 bit_cast 的结果。
If the endianness is little , we need to reverse order.如果字节序little ，我们需要颠倒顺序。 Not the same as c++23 byteswap, as we swap elements.与 c++23 字节交换不同，因为我们交换元素。

I arbitrarily decided that big-endian has the correct order with the high bits at x[0].我任意决定大端序在 x[0] 处的高位具有正确的顺序。

#include <bit>
#include <array>
#include <cstdint>
#include <concepts>

template <std::integral T>
auto split64(uint64_t x) { 
    enum consts {
        BITS=sizeof(uint64_t)*8,
        ELEM=sizeof(uint64_t)/sizeof(T),
        BASE=BITS-ELEM,
        MASK=~0ULL >> (BITS-(BITS/ELEM))
    };
    using split=std::array<T, ELEM>;
    static const bool is_big=std::endian::native==std::endian::big;
    static const bool is_little=std::endian::native==std::endian::little;
    static const bool can_cast=((is_big || is_little)
        && (sizeof(uint64_t) == sizeof(split)));

    // All ifs can be eliminated at compile time
    // since they are always true or always false
    if (!can_cast)
    {
        split ret;
        for (int e = 0; e < ret.size(); ++e)
        {
            ret[e]=(x>>(BASE-e*ELEM)) & MASK;
        }
        return ret;
    }
    split tmp=std::bit_cast<split>(x);
    if (is_big)
    {
        return tmp;
    }
    split ret;
    for (int e=0; e < ELEM; ++e)
    {
        ret[e]=tmp[ELEM-(e+1)];
    }
    return ret;
}

uint16_t tst(uint64_t x, int y)
{
    return split64<uint16_t>(x)[y];
}

I believe this should be defined behavior.我相信这应该是定义的行为。

Answer 5

Don't bother, because arithmetic is faster anyway:不要打扰，因为无论如何算术都更快：

uint64_t v;
uint32_t lower = v;
uint32_t upper = v >> 32;

在 C++20 中输入双关 uint64_t 作为两个 uint32_t

问题描述

5 个解决方案

解决方案1
7 已采纳 2021-11-10 10:22:10

解决方案2
3 2021-11-10 10:29:00

解决方案3
0 2021-11-10 10:23:43

解决方案4
0 2021-11-10 21:03:32

解决方案5
0 2021-11-14 02:57:45

在 C++20 中输入双关 uint64_t 作为两个 uint32_t

问题描述

5 个解决方案

解决方案1 7 已采纳 2021-11-10 10:22:10

解决方案2 3 2021-11-10 10:29:00

解决方案3 0 2021-11-10 10:23:43

解决方案4 0 2021-11-10 21:03:32

解决方案5 0 2021-11-14 02:57:45

解决方案1
7 已采纳 2021-11-10 10:22:10

解决方案2
3 2021-11-10 10:29:00

解决方案3
0 2021-11-10 10:23:43

解决方案4
0 2021-11-10 21:03:32

解决方案5
0 2021-11-14 02:57:45