简体   繁体   中英

Performance of DWORD vs QWORD alignment in 32 bit code

I have a lot of objects that inherit from each other etc.

Default Embarcadero C++ Builder 2009 has set Data alignment in properties to QWORD. If I change this to DWORD many of my objects shrink in size, because often they have 4 bytes to spare and again in an inheriting object etc. So accumulated this has a good effect.

Shrinking them is appealing since sometimes I need to allocate millions of them in memory.

I would like to know why QWORD is the default for a 32 bit application ? I expected it to be DWORD in fact. And will changing it to DWORD create performance issues ?

Also, since I allocate lots of them in memory, are they allocated nicely packed together, one after the other, or is there padding between them as well, and is this padding also based on the project setting (Data Alignment: QWORD / DWORD) ? If the objects are all allocated on QWORD boundaries changing the actual objects' sizes won't have a net effect.

Overall, there are a number of things to consider for alignment:

First, according to the Wikipedia page on Data Structure Alignment , Embarcadero might be a bit of an exception if it aligns all objects to 8-byte boundaries. The article claims that GCC, VC++ and Borland's computer don't align data at 8 bytes unless it's a double or long long .

A number of things do force alignment:

  • malloc and operator new will give you memory regions that are 8-byte aligned. Also, if you perform separate calls to malloc or new , the objects won't be nicely packed together. There will be at least 8 or so bytes between them for allocator metadata. There is also no guarantee that the objects are close to each other in memory.
  • Stack frames are aligned to 8 or 16 bytes, depending on the architecture
  • SSE instructions need 16-byte aligned data

Regarding performance: I don't think you'll see a large difference between 4-byte and 8-byte alignment. Daniel Lemire measured , and found small, if any, differences between 1-byte and 4-byte aligned data; I expect them to be even smaller between 4 and 8 bytes.

Probably the biggest performance and memory usage difference in your scenario could come from allocating space for many objects at the same time (eg, by storing them in a std::vector ) instead of calling new for each individual object.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM