Below is a downstripped example of a tagged union template "Storage", which can assume two types L and R enclosed in a union, plus a bool indicating which of them is stored. The instantiation uses two different sized types, the smaller one actually being empty.
#include <utility>
struct Empty
{
};
struct Big
{
long a;
long b;
long c;
};
template<typename L, typename R>
class Storage final
{
public:
constexpr explicit Storage(const R& right) : payload{right}, isLeft{false}
{
}
private:
union Payload
{
constexpr Payload(const R& right) : right{right}
{
}
L left;
R right;
};
Payload payload;
bool isLeft;
};
// Toggle constexpr here
constexpr static Storage<Big, Empty> createStorage()
{
return Storage<Big, Empty>{Empty{}};
}
Storage<Big, Empty> createStorage2()
{
return createStorage();
}
The function "createStorage2" should therefor only populate the bool tag, and leave the union alone. So I would expect a compile result with default optimization "-O":
createStorage2():
mov rax, rdi
mov BYTE PTR [rdi+24], 0
ret
Both GCC and ICC instead generate something like
createStorage2():
mov rax, rdi
mov QWORD PTR [rdi], 0
mov QWORD PTR [rdi+8], 0
mov QWORD PTR [rdi+16], 0
mov QWORD PTR [rdi+24], 0
ret
zeroing the entire 32 byte structure, while clang generates the expected code. You can reproduce this with https://godbolt.org/z/VsDQUu . GCC will revert to the desired initialization of the bool tag only, when you remove constexpr from the "createStorage" static function, while ICC remains unimpressed and still fills all 32 bytes.
Doing so is probably not a standard violation, as unused bits being "undefined" allows anything, including being set to zero and consuming unnecessary CPU cycles. But it's annoying, if you introduced the union for efficiency reason in first place, and your union members vary largely in size.
What is going on here? Is the any way to work around this behavior, provided that removing constexpr from constructors and the static function is not an option?
A side note: ICC seems to perform some extra operations even when all constexpr are removed, as in https://godbolt.org/z/FnjoPC :
createStorage2():
mov rax, rdi #44.16
mov BYTE PTR [-16+rsp], 0 #39.9
movups xmm0, XMMWORD PTR [-40+rsp] #44.16
movups xmm1, XMMWORD PTR [-24+rsp] #44.16
movups XMMWORD PTR [rdi], xmm0 #44.16
movups XMMWORD PTR [16+rdi], xmm1 #44.16
ret #44.16
What is the purpose of these movups instructions?
(This is just speculation of mine, but it's too long for a comment)
What is going on here?
Since constructors are constexpr
, it could be that the Payload
as a whole has some value computed at compile-time. Then, at runtime, that complete Payload
is returned. To my knowledge, it is not required for a compiler to recognize that a certain portion of a compile-time value is uninitialized and that it should generate no code for it.
In some crazy compiler it could even happen that the compile-time Payload
has garbage values in an uninitialized section, and then it would produce for example:
createStorage2():
mov rax, rdi
mov QWORD PTR [rdi], 0xbaadf00d
mov QWORD PTR [rdi+8], 0xbaadf00d
mov QWORD PTR [rdi+16], 0xbaadf00d
mov QWORD PTR [rdi+24], 0
ret
In general constexpr
doesn't like uninitialized values, but unions are a way around it a bit.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.