[英]Type-pun uint64_t as two uint32_t in C++20
This code to read a uint64_t
as two uint32_t
is UB due to the strict aliasing rule:由于严格的别名规则,这段将
uint64_t
读取为两个uint32_t
代码是 UB:
uint64_t v;
uint32_t lower = reinterpret_cast<uint32_t*>(&v)[0];
uint32_t upper = reinterpret_cast<uint32_t*>(&v)[1];
Likewise, this code to write the upper and lower part of an uint64_t
is UB due to the same reason:同样,这段写
uint64_t
上下部分的代码也是UB,原因相同:
uint64_t v;
uint32_t* lower = reinterpret_cast<uint32_t*>(&v);
uint32_t* upper = reinterpret_cast<uint32_t*>(&v) + 1;
*lower = 1;
*upper = 1;
How can one write this code in a safe and clean way in modern C++20, potentially using std::bit_cast
?如何在现代 C++20 中以安全和干净的方式编写此代码,可能使用
std::bit_cast
?
Using std::bit_cast :使用std::bit_cast :
#include <bit>
#include <array>
#include <cstdint>
#include <iostream>
int main() {
uint64_t x = 0x12345678'87654321ULL;
// Convert one u64 -> two u32
auto v = std::bit_cast<std::array<uint32_t, 2>>(x);
std::cout << std::hex << v[0] << " " << v[1] << std::endl;
// Convert two u32 -> one u64
auto y = std::bit_cast<uint64_t>(v);
std::cout << std::hex << y << std::endl;
}
Output:输出:
87654321 12345678
1234567887654321
std::bit_cast is available only in C++20. std::bit_cast仅在 C++20 中可用。 Prior to C++20 you can manually implement
std::bit_cast
through std::memcpy , with one exception that such implementation is not constexpr
like C++20 variant:在 C++20 之前,您可以通过std::memcpy手动实现
std::bit_cast
,但有一个例外,即此类实现不像 C++20 变体那样constexpr
:
template <class To, class From>
inline To bit_cast(From const & src) noexcept {
//return std::bit_cast<To>(src);
static_assert(std::is_trivially_constructible_v<To>,
"Destination type should be trivially constructible");
To dst;
std::memcpy(&dst, &src, sizeof(To));
return dst;
}
For this specific case of integers quite optimal would be just to do bit shift/or arithmetics to convert one u64 to two u32 and back again.对于整数的这种特定情况,非常理想的只是进行位移/或算术以将一个 u64 转换为两个 u32 并再次返回。
std::bit_cast
is more generic, supporting any trivially constructible type, although std::bit_cast solution should be same optimal as bit arithmetics on modern compilers with high level of optimization. std::bit_cast
更通用,支持任何可简单构造的类型,尽管 std::bit_cast 解决方案应该与具有高级优化的现代编译器上的位算术相同。
One extra profit of bit arithmetics is that it handles correctly endianess, unlike std::bit_cast.与 std::bit_cast 不同,位算术的一个额外好处是它可以正确处理字节序。
#include <cstdint>
#include <iostream>
int main() {
uint64_t x = 0x12345678'87654321ULL;
// Convert one u64 -> two u32
uint32_t lo = uint32_t(x), hi = uint32_t(x >> 32);
std::cout << std::hex << lo << " " << hi << std::endl;
// Convert two u32 -> one u64
uint64_t y = (uint64_t(hi) << 32) | lo;
std::cout << std::hex << y << std::endl;
}
Output:输出:
87654321 12345678
123456788765432
in a safe and clean way
以安全和清洁的方式
Do not use reinterpret_cast.不要使用 reinterpret_cast。 Do not depend on unclear code that depends on some specific compiler settings and fishy, uncertain behavior.
不要依赖于依赖于某些特定编译器设置和可疑的、不确定的行为的不清楚的代码。 Use exact arithmetic operations with well-known defined result.
使用具有众所周知的定义结果的精确算术运算。 Classes and operator overloads are all there waiting for you.
类和运算符重载都在等着你。 For example, some global functions:
例如一些全局函数:
#include <iostream>
struct UpperUint64Ref {
uint64_t &v;
UpperUint64Ref(uint64_t &v) : v(v) {}
UpperUint64Ref operator=(uint32_t a) {
v &= 0x00000000ffffffffull;
v |= (uint64_t)a << 32;
return *this;
}
operator uint64_t() {
return v;
}
};
struct LowerUint64Ref {
uint64_t &v;
LowerUint64Ref(uint64_t &v) : v(v) {}
/* as above */
};
UpperUint64Ref upper(uint64_t& v) { return v; }
LowerUint64Ref lower(uint64_t& v) { return v; }
int main() {
uint64_t v;
upper(v) = 1;
}
Or interface object:或接口对象:
#include <iostream>
struct Uint64Ref {
uint64_t &v;
Uint64Ref(uint64_t &v) : v(v) {}
struct UpperReference {
uint64_t &v;
UpperReference(uint64_t &v) : v(v) {}
UpperReference operator=(uint32_t a) {
v &= 0x00000000ffffffffull;
v |= (uint64_t)a << 32u;
}
};
UpperReference upper() {
return v;
}
struct LowerReference {
uint64_t &v;
LowerReference(uint64_t &v) : v(v) {}
};
LowerReference lower() { return v; }
};
int main() {
uint64_t v;
Uint64Ref r{v};
r.upper() = 1;
}
Using std::memcpy
使用
std::memcpy
#include <cstdint>
#include <cstring>
void foo(uint64_t& v, uint32_t low_val, uint32_t high_val) {
std::memcpy(reinterpret_cast<unsigned char*>(&v), &low_val,
sizeof(low_val));
std::memcpy(reinterpret_cast<unsigned char*>(&v) + sizeof(low_val),
&high_val, sizeof(high_val));
}
int main() {
uint64_t v = 0;
foo(v, 1, 2);
}
With O1
, the compiler reduces foo
to:使用
O1
,编译器将foo
为:
mov DWORD PTR [rdi], esi
mov DWORD PTR [rdi+4], edx
ret
Meaning there are no extra copies made, std::memcpy
just serves as a hint to the compiler.这意味着没有额外的副本,
std::memcpy
只是作为编译器的提示。
std::bit_cast
alone is not enough, since results will vary by the endian of the system.单独的
std::bit_cast
是不够的,因为结果会因系统的字节序而异。
Fortunately <bit>
also contains std::endian
.幸运的是
<bit>
还包含std::endian
。
Keeping in mind that optimizers generally compile-time resolve if
s that are always true or false, we can just test endianness and act accordingly.请记住,优化器通常会在编译时解析
if
总是为真或为假,我们可以只测试字节序并采取相应的行动。
We only know beforehand how to handle big or little endian.我们事先只知道如何处理大端或小端。 If it is not one of those, bit_cast results are not decodable.
如果不是其中之一,则 bit_cast 结果不可解码。
Another factor that can spoil things is padding.另一个可以破坏事物的因素是填充。 Using bit_cast assumes 0 padding between array elements.
使用 bit_cast 假设数组元素之间填充为 0。
So we can check if there is no padding and endianness is big or little to see if it is castable.所以我们可以检查是否没有填充和字节序是大还是小,看看它是否是可铸造的。
big
-- just return the results of bit_cast.big
——只返回 bit_cast 的结果。little
, we need to reverse order.little
,我们需要颠倒顺序。 Not the same as c++23 byteswap, as we swap elements. I arbitrarily decided that big-endian has the correct order with the high bits at x[0].我任意决定大端序在 x[0] 处的高位具有正确的顺序。
#include <bit>
#include <array>
#include <cstdint>
#include <concepts>
template <std::integral T>
auto split64(uint64_t x) {
enum consts {
BITS=sizeof(uint64_t)*8,
ELEM=sizeof(uint64_t)/sizeof(T),
BASE=BITS-ELEM,
MASK=~0ULL >> (BITS-(BITS/ELEM))
};
using split=std::array<T, ELEM>;
static const bool is_big=std::endian::native==std::endian::big;
static const bool is_little=std::endian::native==std::endian::little;
static const bool can_cast=((is_big || is_little)
&& (sizeof(uint64_t) == sizeof(split)));
// All ifs can be eliminated at compile time
// since they are always true or always false
if (!can_cast)
{
split ret;
for (int e = 0; e < ret.size(); ++e)
{
ret[e]=(x>>(BASE-e*ELEM)) & MASK;
}
return ret;
}
split tmp=std::bit_cast<split>(x);
if (is_big)
{
return tmp;
}
split ret;
for (int e=0; e < ELEM; ++e)
{
ret[e]=tmp[ELEM-(e+1)];
}
return ret;
}
uint16_t tst(uint64_t x, int y)
{
return split64<uint16_t>(x)[y];
}
I believe this should be defined behavior.我相信这应该是定义的行为。
Don't bother, because arithmetic is faster anyway:不要打扰,因为无论如何算术都更快:
uint64_t v;
uint32_t lower = v;
uint32_t upper = v >> 32;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.