简体   繁体   中英

Do modern c++ compilers optimize assignments after type casting?

Take the following code:

char chars[4] = {0x5B, 0x5B, 0x5B, 0x5B};
int* b = (int*) &chars[0];

The (int*) &chars[0] value is going to be used in a loop (a long loop). Is there any advantage in using (int*) &chars[0] over b in my code? Is there any overhead in creating b ? Since I only want to use it as an alias and improve code readability.

Also, is it OK to do this kind of type casting as long as I know what I'm doing? Or should I always memcpy() to another array with the correct type and use that? Would I encounter any kind of undefined behavior? because in my testing so far, it works, but I've seen people discouraging this kind of type casting.

is it OK to do this kind of type casting as long as I know what I'm doing?

No, this is not OK. This is not safe . The C++ standard does not allow that. You can access to an object representation (ie. casting an object pointer to char* ) although the result is dependent of the target platform (due to the endianess and padding). However, you cannot do the opposite safely (ie. without an undefined behaviour).

More specifically, the int type can have different alignment requirements (typically aligned to 4 or 8 bytes) than char (not aligned). Thus, your array is likely not aligned and the cast cause an undefined behaviour when b will be dereferenced. Note that it can cause a crash on some processors (AFAIK, POWER for example) although mainstream x86-64 processors supports that. Moreover, compilers can assume that b is aligned in memory (ot alignof(int) ).

Or should I always memcpy() to another array with the correct type and use that?

Yes, or alternative C++ operations like the new std::bit_cast available since C++20. Do not worry about performance: most compilers (GCC, Clang, ICC, and certainly MSVC) does optimize such operations (called type punning ).

Would I encounter any kind of undefined behavior?

As said before, yes, as long as the type punning is not done correctly. For more information about this you can read the following links:

because in my testing so far, it works, but I've seen people discouraging this kind of type casting.

It often works on simple examples on x86-64 processors. However, when you are dealing with a big code, compilers does perform silly optimizations (but totally correct ones regarding the C++ standard). To quote cppreference : "Compilers are not required to diagnose undefined behaviour (although many simple situations are diagnosed), and the compiled program is not required to do anything meaningful.". Such issue are very hard to debug as they generally only appear when optimizations are enabled and in some specific cases. The result of a program can change regarding the inlining of functions which is dependent of compiler heuristics. In your case, this is dependent of the alignment of the stack which is dependent of compiler optimizations and declared/used variables in the current scope. Some processors does not support unaligned accesses (eg. accesses that cross a cache line boundary) which resulting in an hardware exception of data corruption.

So put it shortly, "it works" so far does not means it will always work everywhere anytime.

The (int*) &chars[0] value is going to be used in a loop (a long loop). Is there any advantage in using (int*) &chars[0] over b in my code? Is there any overhead in creating b? Since I only want to use it as an alias and improve code readability.

Assuming you use a correct way to do type punning (eg. memcpy ), then optimizing compilers can fully optimize this initialization as long as optimization flags are enabled. You should not worry about that unless you find that the generated code is poorly optimized. Correctness matters more than performance .

If code performs multiple discrete byte operations using a pointer derived from a pointer to a type which requires word alignment, clang will sometimes replace the discrete writes with a word write that would succeed if the original object was aligned for that word type, but would fail on systems that don't support unaligned accesses if the object isn't aligned the way the compiler expects.

Among other things, this means that if one casts a pointer to T into a pointer to a union containing T, code which attempts to use the union pointer to access the original type may fail if the union contains any types that require an alignment stricter than the original type, even if the union is only accessed via the member of the original type .

AFAIK, a C compiler does not insert any code when casting a pointer - which means that both chars and b are just memory addresses. Normally a C++ compiler should compile this in the same way as a C compiler - this the reason C++ has different, more advanced, casting semantics.

But you can always compile this and then disassemble it in gdb to see for yourself.

Otherwise, as long as you are aware of the endianness problems or potentially different int sizes on exotic platforms, your casting is safe.

See this question also: In C, does casting a pointer have overhead?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM