简体   繁体   中英

Should I prefer to use small types of int (int8 and int16) in C++ code?

I'm working in a C++/Qt project for Embedded Linux where we are constantly "duelling" against the limitations of our processor specially when it comes to updating the graphs in the user interface. Thanks to those limitations (and specially our situation some time ago when things were even worse), I try to optimize the code always when I can and if the costs of optimization are minimum. One of such optimizations I was doing is to always use the correct integer value for the situation I'm handling: qint8, qint16 and qint32 depending on how big is the value I need.

But some time ago I read somewhere that instead of trying to use the minimal size of integer when possible, I should always prefer to use the integer value related to the capacity of my processor, that is, if my processor is 32-bit oriented, then I should prefer to use qint32 always even when such a big integer wasn't required. In a first moment I couldn't understand why, but the answer to this question suggest that is because the performance of the processor is greater when it has to work with its "default size of integer".

Well I'm not convinced. First of all no actual reference was provided confirming such a thesis: I just can't understand why writing and reading from a 32-bit memory space would be slower then doing it with 32 bit integer (and the explanation given wasn't much comprehensible, btw). Second there are some moments on my app when I need to transfer data from one side to the other such as when using Qt's signals and slots mechanism. Since I'm transferring data from one point to the other shouldn't smaller data always give an improvement over bigger data? I mean a signal sending two chars (not by reference) isn't supposed to do the work quicker then sending two 32 bit integers?

In fact, while the "processor explanation" suggests using the characteristics of your processor, other cases suggests the opposite. For example, when dealing with databases, this and this threads both suggests that there is an advantage (even if just in some cases) in using smaller versions of integer.

So, after all, should I prefer to use small types of int when the context allows or not? Or is there a list of cases when one approach or the other is more likely to give better or worst results? (eg I should use int8 and int16 when using databases but the default type of my processor in all other situations)

And as a last question: Qt normally have int-based implemenations of its functions. In such cases, doesn't the cast operation annihilates any possible improvement that one could have by using minor integers?

This question is really too broad without specifying a specific CPU. Because some 32 bit CPUs have plenty of instructions for handling smaller types, some don't. Some 32 bit CPUs handle misaligned access smoothly, some produce slower code because of it, and some halt and catch fire when they encounter it.


That being said, first of all there is the case of standard integer promotion present in every C and C++ program, which will implicitly convert all small integer types you use into int .

The compiler is free to use integer promotion as specified in the standard, or to optimize it away, whichever leads to the most effective code, as long as the results are the same as for non-optimized code.

Implicit promotion may create more effective code but it may also create subtle, disastrous bugs with unexpected type and signedness changes, if the programmer is not aware of how the various implicit type promotion rules work. Sadly, plenty of would-be C and C++ programmers are not. When using smaller integer types, you need to be a much more competent/awake programmer than if you just use 32 bit sized variables all over the place.

So if you are reading this but have never heard of the integer promotion rules or the usual arithmetic conversions / balancing , then I would strongly suggest that you immediately stop any attempt of manually optimizing integer sizes and go read up on those implicit promotion rules instead.


If you are aware of all implicit promotion rules, then you can do manual optimization by using smaller integer types. But use the ones which gives the compiler most flexibility. Those are:

#include <stdint.h>

int_fast8_t
int_fast16_t
uint_fast8_t
uint_fast16_t

When these types are used, the compiler is free to change them for a larger type if that would yield faster code.

The difference between the above variables just relying on integer promotion/expression optimization, is that with the fast types the compiler can not only decide which type suits the CPU registers best for a given calculation, but also decide memory consumption and alignment when the variables are allocated.

One strong argument against using small variables is that when mapped to registers (assuming they're not expanded implicitly), they may cause unintended false dependencies if your ISA uses partial registers. Such is the case with x86, as some old programs still employ AH or AX and their counterparts as 8/16 bit sizes registers. If your register has some value stuck in the upper part (due to a previous write to the full register), your CPU may be forced to carry it along and merge it with any partial value you calculate to maintain correctness, causing serial chains of dependencies even if your calculations were independent.

The memory claim raised by the answer you linked also holds, although I find it a bit weaker - it's true that the memory subsystems usually work in full cache line granularity (that's often 64 bytes these days), and then rotate and mask, but that alone should not cause a performance impact - if anything it improves performance when your data access patterns exhibit spatial locality. Smaller variables may in some cases also increase the risk of causing alignment issues, especially if you pack variables of different sizes closely, but most compilers should know better than that (unless explicitly forced not to).

I think the main problem with small variables over memory, would be again - increasing the chances of false dependency - the merging is done implicitly by the memory system, but if other cores (or sockets) are sharing some of you variables, you're running the risk of knocking the entire line out of the cache)

The concept of always using int is good for temporary values (like a loop variant) in general because it is likely to be promoted to int for many operations or library calls.

When it comes to storing large amounts of data, especially in arrays, then using a smaller type is much better. The question is, how many is large, and is unfortunately situational.

Structure padding will also give you some wiggle room about when you can use a full int for free. For example, if there are 3 short s, the most used one might payoff being an int . As an aside, you should sort your members by size to avoid unneeded gaps due to padding.

The unfortunate answer, especially if you are using a resource constrained environment like Embedded Linux, is to test. Sometimes it will be worth the space, sometimes it wont.

In general, there is little use in too early optimization. For local variables and smaller classes and structs, there is little to no gain in using non-native types. Depending on the procedure call standard, packing/unpacking smaller types into a single register might even add more code than the word-size types cost.

For larger arrays, list/tree nodes (IOW: larger data structures), however, things can be different. It might be worth here to use appropriate types, not the natural, use C-stype structs without methods, etc. For most "modern" (since the end of the last century) Linux-compatible architecture, there is mostly no penalty for smaller integer types. For float types, there might be architectures only supporting float, not double by hardware or double takes longer to process. For these, using the smaller type does not just reduce memory footprint, but is also faster.

Instead of reducing the types of members/variables, it is worth to optimize the class hierarchy or even use native C code (or C-style coding) for some parts. Things like virtual methods or RTTI can be pretty costly. The former uses large jump-tables, the latter adds descriptors for each class.

Note that some statements assume code and data to reside in RAM as typical for (embedded) Linux systems. If code/constants are stored in Flash eg, you have to sort the statements by the impact on the respective memory type.

It depends on what you are doing, if each item goes through some math, then depending on the instruction set it may have to mask off and sign extend anything smaller than the register, where if you match the register size then you wont have that problem and that extra code generated all the time. I assume that is why int or any of the standard C/C++ variable types are size independent, use them for that reason.

There is also the issue of alignments, newer processors are better, but no matter what if you are unaligned to the ram bus or processor bus boundary you can/will burn extra clock cycles even if you match the processor size. Also depending on the processor and all the controller logic that spreads out from it, multiple byte accesses within the same area may result in separate bus cycles, even if the processor can sign extend or zero extend for you within the instruction. And of course some processors themselves or the busses that extend out may optimize, you just read that word, here is a byte from it, and then you hit the cache if you have one so a single byte read can/will result in multiple words being fetched and stored, so even if the processor does require multiple cycles they are shorter and faster. Small efforts to keep things aligned on 4 or 8 or larger byte boundaries help performance on almost all platforms (if you move data in chunks those sizes or larger from your high level code perspective)

For blobs of data, that you may just be moving around and not so much processing as individual bytes then burning 32 bits to store 8 or 16 is obviously a huge waste of ram, which wastes cache if you have one, and performance goes down because that cache space could have been used for other things or hold the same things longer.

You are into the realm of premature optimization. You can do some things like this as habits to generally keep from wasting a few instructions here and there (use unsigned int or int for loop variables for example and let the compiler pick the size, build structures with the aligned/larger items first then smaller items last), but get your code working and debugged, isolate the performance problems if any, then weigh the optimizing to a platform/compiler vs portability and maintainability. Or just flat out keep a high level language version of a function and have an alternate hand tuned assembly version for a specific platform, so you can go back (can even just compile then tweak/fix the asm). But again only if you really really need that extra performance and you have isolated a low performance section and are willing to pay the price for it.

For information about integral promotion:

http://en.cppreference.com/w/cpp/language/implicit_cast

Prvalues of small integral types (such as char) may be converted to prvalues of larger integral types (such as int). In particular, arithmetic operators do not accept types smaller than int as arguments, and integral promotions are automatically applied after lvalue-to-rvalue conversion, if applicable. This conversion always preserves the value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM