简体   繁体   English

在 C++20 中输入双关 uint64_t 作为两个 uint32_t

[英]Type-pun uint64_t as two uint32_t in C++20

This code to read a uint64_t as two uint32_t is UB due to the strict aliasing rule:由于严格的别名规则,这段将uint64_t读取为两个uint32_t代码是 UB:

uint64_t v;
uint32_t lower = reinterpret_cast<uint32_t*>(&v)[0];
uint32_t upper = reinterpret_cast<uint32_t*>(&v)[1];

Likewise, this code to write the upper and lower part of an uint64_t is UB due to the same reason:同样,这段写uint64_t上下部分的代码也是UB,原因相同:

uint64_t v;
uint32_t* lower = reinterpret_cast<uint32_t*>(&v);
uint32_t* upper = reinterpret_cast<uint32_t*>(&v) + 1;

*lower = 1;
*upper = 1;

How can one write this code in a safe and clean way in modern C++20, potentially using std::bit_cast ?如何在现代 C++20 中以安全和干净的方式编写此代码,可能使用std::bit_cast

Using std::bit_cast :使用std::bit_cast

Try it online!在线试试吧!

#include <bit>
#include <array>
#include <cstdint>
#include <iostream>

int main() {
    uint64_t x = 0x12345678'87654321ULL;
    // Convert one u64 -> two u32
    auto v = std::bit_cast<std::array<uint32_t, 2>>(x);
    std::cout << std::hex << v[0] << " " << v[1] << std::endl;
    // Convert two u32 -> one u64
    auto y = std::bit_cast<uint64_t>(v);
    std::cout << std::hex << y << std::endl;
}

Output:输出:

87654321 12345678
1234567887654321

std::bit_cast is available only in C++20. std::bit_cast仅在 C++20 中可用。 Prior to C++20 you can manually implement std::bit_cast through std::memcpy , with one exception that such implementation is not constexpr like C++20 variant:在 C++20 之前,您可以通过std::memcpy手动实现std::bit_cast ,但有一个例外,即此类实现不像 C++20 变体那样constexpr

template <class To, class From>
inline To bit_cast(From const & src) noexcept {
    //return std::bit_cast<To>(src);
    static_assert(std::is_trivially_constructible_v<To>,
        "Destination type should be trivially constructible");
    To dst;
    std::memcpy(&dst, &src, sizeof(To));
    return dst;
}

For this specific case of integers quite optimal would be just to do bit shift/or arithmetics to convert one u64 to two u32 and back again.对于整数的这种特定情况,非常理想的只是进行位移/或算术以将一个 u64 转换为两个 u32 并再次返回。 std::bit_cast is more generic, supporting any trivially constructible type, although std::bit_cast solution should be same optimal as bit arithmetics on modern compilers with high level of optimization. std::bit_cast更通用,支持任何可简单构造的类型,尽管 std::bit_cast 解决方案应该与具有高级优化的现代编译器上的位算术相同。

One extra profit of bit arithmetics is that it handles correctly endianess, unlike std::bit_cast.与 std::bit_cast 不同,位算术的一个额外好处是它可以正确处理字节序。

Try it online!在线试试吧!

#include <cstdint>
#include <iostream>

int main() {
    uint64_t x = 0x12345678'87654321ULL;
    // Convert one u64 -> two u32
    uint32_t lo = uint32_t(x), hi = uint32_t(x >> 32);
    std::cout << std::hex << lo << " " << hi << std::endl;
    // Convert two u32 -> one u64
    uint64_t y = (uint64_t(hi) << 32) | lo;
    std::cout << std::hex << y << std::endl;
}

Output:输出:

87654321 12345678
123456788765432

in a safe and clean way以安全和清洁的方式

Do not use reinterpret_cast.不要使用 reinterpret_cast。 Do not depend on unclear code that depends on some specific compiler settings and fishy, uncertain behavior.不要依赖于依赖于某些特定编译器设置和可疑的、不确定的行为的不清楚的代码。 Use exact arithmetic operations with well-known defined result.使用具有众所周知的定义结果的精确算术运算。 Classes and operator overloads are all there waiting for you.类和运算符重载都在等着你。 For example, some global functions:例如一些全局函数:

#include <iostream>

struct UpperUint64Ref {
   uint64_t &v;
   UpperUint64Ref(uint64_t &v) : v(v) {}
   UpperUint64Ref operator=(uint32_t a) {
      v &= 0x00000000ffffffffull;
      v |= (uint64_t)a << 32;
      return *this;
   }
   operator uint64_t() {
      return v;
   }
};
struct LowerUint64Ref { 
    uint64_t &v;
    LowerUint64Ref(uint64_t &v) : v(v) {}
    /* as above */
};
UpperUint64Ref upper(uint64_t& v) { return v; }
LowerUint64Ref lower(uint64_t& v) { return v; }

int main() {
   uint64_t v;
   upper(v) = 1;
}

Or interface object:或接口对象:

#include <iostream>

struct Uint64Ref {
   uint64_t &v;
   Uint64Ref(uint64_t &v) : v(v) {}
   struct UpperReference {
       uint64_t &v;
       UpperReference(uint64_t &v) : v(v) {}
       UpperReference operator=(uint32_t a) {
           v &= 0x00000000ffffffffull;
           v |= (uint64_t)a << 32u;
       }
   };
   UpperReference upper() {
      return v;
   }
   struct LowerReference {
       uint64_t &v;
       LowerReference(uint64_t &v) : v(v) {}
   };
   LowerReference lower() { return v; }
};
int main() {
   uint64_t v;
   Uint64Ref r{v};
   r.upper() = 1;
}

Using std::memcpy使用std::memcpy

#include <cstdint>
#include <cstring>

void foo(uint64_t& v, uint32_t low_val, uint32_t high_val) {
    std::memcpy(reinterpret_cast<unsigned char*>(&v), &low_val,
                sizeof(low_val));
    std::memcpy(reinterpret_cast<unsigned char*>(&v) + sizeof(low_val),
                &high_val, sizeof(high_val));
}

int main() {
    uint64_t v = 0;
    foo(v, 1, 2);
}

With O1 , the compiler reduces foo to:使用O1 ,编译器将foo为:

        mov     DWORD PTR [rdi], esi
        mov     DWORD PTR [rdi+4], edx
        ret

Meaning there are no extra copies made, std::memcpy just serves as a hint to the compiler.这意味着没有额外的副本, std::memcpy只是作为编译器的提示。

std::bit_cast alone is not enough, since results will vary by the endian of the system.单独的std::bit_cast是不够的,因为结果会因系统的字节序而异。

Fortunately <bit> also contains std::endian .幸运的是<bit>还包含std::endian

Keeping in mind that optimizers generally compile-time resolve if s that are always true or false, we can just test endianness and act accordingly.请记住,优化器通常会在编译时解析if总是为真或为假,我们可以只测试字节序并采取相应的行动。

We only know beforehand how to handle big or little endian.我们事先只知道如何处理大端或小端。 If it is not one of those, bit_cast results are not decodable.如果不是其中之一,则 bit_cast 结果不可解码。

Another factor that can spoil things is padding.另一个可以破坏事物的因素是填充。 Using bit_cast assumes 0 padding between array elements.使用 bit_cast 假设数组元素之间填充为 0。

So we can check if there is no padding and endianness is big or little to see if it is castable.所以我们可以检查是否没有填充和字节序是大还是小,看看它是否是可铸造的。

  • If it is not castable, we do a bunch of shifts as per the old method.如果它不可铸造,我们会按照旧方法进行大量转换。 (this can be slow) (这可能很慢)
  • If the endianness is big -- just return the results of bit_cast.如果字节序big ——只返回 bit_cast 的结果。
  • If the endianness is little , we need to reverse order.如果字节序little ,我们需要颠倒顺序。 Not the same as c++23 byteswap, as we swap elements.与 c++23 字节交换不同,因为我们交换元素。

I arbitrarily decided that big-endian has the correct order with the high bits at x[0].我任意决定大端序在 x[0] 处的高位具有正确的顺序。

#include <bit>
#include <array>
#include <cstdint>
#include <concepts>

template <std::integral T>
auto split64(uint64_t x) { 
    enum consts {
        BITS=sizeof(uint64_t)*8,
        ELEM=sizeof(uint64_t)/sizeof(T),
        BASE=BITS-ELEM,
        MASK=~0ULL >> (BITS-(BITS/ELEM))
    };
    using split=std::array<T, ELEM>;
    static const bool is_big=std::endian::native==std::endian::big;
    static const bool is_little=std::endian::native==std::endian::little;
    static const bool can_cast=((is_big || is_little)
        && (sizeof(uint64_t) == sizeof(split)));

    // All ifs can be eliminated at compile time
    // since they are always true or always false
    if (!can_cast)
    {
        split ret;
        for (int e = 0; e < ret.size(); ++e)
        {
            ret[e]=(x>>(BASE-e*ELEM)) & MASK;
        }
        return ret;
    }
    split tmp=std::bit_cast<split>(x);
    if (is_big)
    {
        return tmp;
    }
    split ret;
    for (int e=0; e < ELEM; ++e)
    {
        ret[e]=tmp[ELEM-(e+1)];
    }
    return ret;
}

uint16_t tst(uint64_t x, int y)
{
    return split64<uint16_t>(x)[y];
}

I believe this should be defined behavior.我相信这应该是定义的行为。

Don't bother, because arithmetic is faster anyway:不要打扰,因为无论如何算术都更快:

uint64_t v;
uint32_t lower = v;
uint32_t upper = v >> 32;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 c++中将一长串字符转换为uint32_t或uint64_t - convert a long string of characters to uint32_t or uint64_t in c++ 如何以类型安全的方式分配uint32_t :: max和uint64_t的最小值? - How to assign min of uint32_t::max and uint64_t in type safe manner? 函数重载,只有c ++中的纯虚函数中的“ uint32_t”和uint64_t”发生了变化 - Function overloading with the only change in “uint32_t” & uint64_t" in pure virtual function in c++ 在 C 或 C++ 中,uint64_t 的 memory 布局是否保证与 uint32_t[2] 相同? 可以将一个转换为另一个吗? - In C or C++, is memory layout of uint64_t guaranteed to be the same as uint32_t[2] ? Can one be cast as the other? 无符号长的类型与Windows上的uint32_t和uint64_t不同(VS2010) - Type of unsigned long is different from uint32_t and uint64_t on Windows (VS2010) 2 uint32_t之间的数学运算返回uint64_t - math operation between 2 uint32_t returning an uint64_t 为什么uint32_t与uint64_t的速度不同? - why a uint32_t vs uint64_t speed difference? 将uint32_t转换为uint64_t导致不同的值? - casting uint32_t to uint64_t results in different value? 使用多个uint32_t整数生成uint64_t哈希键 - Generate uint64_t hash key with several uint32_t integers 在 uint32_t 范围内变换 uint64_t 范围 - Transform uint64_t range in uint32_t range
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM