简体   繁体   English

32位代码中DWORD与QWORD对齐的性能

[英]Performance of DWORD vs QWORD alignment in 32 bit code

I have a lot of objects that inherit from each other etc. 我有很多互相继承的对象,等等。

Default Embarcadero C++ Builder 2009 has set Data alignment in properties to QWORD. 默认的Embarcadero C ++ Builder 2009已将属性中的“数据对齐”设置为QWORD。 If I change this to DWORD many of my objects shrink in size, because often they have 4 bytes to spare and again in an inheriting object etc. So accumulated this has a good effect. 如果将其更改为DWORD,则我的许多对象的大小都会缩小,因为通常有4个字节可用于备用,而在继承的对象中又有一个字节等。因此,累加会产生很好的效果。

Shrinking them is appealing since sometimes I need to allocate millions of them in memory. 缩小它们很吸引人,因为有时我需要在内存中分配数百万个。

I would like to know why QWORD is the default for a 32 bit application ? 我想知道为什么QWORD是32位应用程序的默认设置? I expected it to be DWORD in fact. 我希望它实际上是DWORD。 And will changing it to DWORD create performance issues ? 并将其更改为DWORD会造成性能问题吗?

Also, since I allocate lots of them in memory, are they allocated nicely packed together, one after the other, or is there padding between them as well, and is this padding also based on the project setting (Data Alignment: QWORD / DWORD) ? 另外,由于我在内存中分配了大量它们,它们是一个接一个地打包在一起,还是在它们之间也存在填充,并且这种填充也是基于项目设置的(数据对齐:QWORD / DWORD) ? If the objects are all allocated on QWORD boundaries changing the actual objects' sizes won't have a net effect. 如果将对象全部分配在QWORD边界上,则更改实际对象的大小不会产生任何效果。

Overall, there are a number of things to consider for alignment: 总体而言,需要考虑很多方面的调整:

First, according to the Wikipedia page on Data Structure Alignment , Embarcadero might be a bit of an exception if it aligns all objects to 8-byte boundaries. 首先,根据Wikipedia上有关“数据结构对齐”的页面 ,如果Embarcadero将所有对象对齐到8字节边界,则可能会有些例外。 The article claims that GCC, VC++ and Borland's computer don't align data at 8 bytes unless it's a double or long long . 该文章声称,除非是doublelong long否则GCC,VC ++和Borland的计算机不会将数据对齐为8个字节。

A number of things do force alignment: 许多事情会强制对齐:

  • malloc and operator new will give you memory regions that are 8-byte aligned. mallocoperator new将为您提供8字节对齐的内存区域。 Also, if you perform separate calls to malloc or new , the objects won't be nicely packed together. 另外,如果您对mallocnew执行单独的调用,则对象将无法很好地打包在一起。 There will be at least 8 or so bytes between them for allocator metadata. 它们之间至少有8个左右的字节用于分配器元数据。 There is also no guarantee that the objects are close to each other in memory. 也不能保证对象在内存中彼此靠近。
  • Stack frames are aligned to 8 or 16 bytes, depending on the architecture 堆栈帧对齐为8或16个字节,具体取决于体系结构
  • SSE instructions need 16-byte aligned data SSE指令需要16字节对齐的数据

Regarding performance: I don't think you'll see a large difference between 4-byte and 8-byte alignment. 关于性能:我认为您不会在4字节对齐和8字节对齐之间看到很大的差异。 Daniel Lemire measured , and found small, if any, differences between 1-byte and 4-byte aligned data; Daniel Lemire进行了测量 ,发现1字节和4字节对齐数据之间的细微差异(如有); I expect them to be even smaller between 4 and 8 bytes. 我希望它们在4到8个字节之间甚至更小。

Probably the biggest performance and memory usage difference in your scenario could come from allocating space for many objects at the same time (eg, by storing them in a std::vector ) instead of calling new for each individual object. 在您的方案中,最大的性能和内存使用差异可能来自于同时为多个对象分配空间(例如,通过将它们存储在std::vector ),而不是为每个单独的对象调用new

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM