为什么32位和64位系统上的“对齐”相同？

Question

I was wondering whether the compiler would use different padding on 32-bit and 64-bit systems, so I wrote the code below in a simple VS2019 C++ console project: 我想知道编译器是否会在32位和64位系统上使用不同的填充，所以我在一个简单的VS2019 C ++控制台项目中编写了下面的代码：

struct Z
{
    char s;
    __int64 i;
};

int main()
{
    std::cout << sizeof(Z) <<"\n"; 
}

What I expected on each "Platform" setting: 我对每个“平台”设置的期望：

x86: 12
X64: 16

Actual result: 实际结果：

x86: 16
X64: 16

Since the memory word size on x86 is 4 bytes, this means it has to store the bytes of i in two different words. 由于x86上的存储器字大小是4个字节，这意味着它必须以两个不同的字存储i的字节。 So I thought the compiler would do padding this way: 所以我认为编译器会以这种方式填充：

struct Z
{
    char s;
    char _pad[3];
    __int64 i;
};

So may I know what the reason behind this is? 那么我可以知道这背后的原因是什么？

For forward-compatibility with the 64-bit system? 为了与64位系统向前兼容？
Due to the limitation of supporting 64-bit numbers on the 32-bit processor? 由于在32位处理器上支持64位数字的限制？

Answer 1

The padding is not determined by the word size, but by the alignment of each data type. 填充不是由字大小决定的，而是由每种数据类型的对齐决定的。

In most cases, the alignment requirement is equal to the type's size. 在大多数情况下，对齐要求等于类型的大小。 So for a 64 bit type like int64 you will get an 8 byte (64 bit) alignment. 因此对于像int64这样的64位类型，您将获得8字节（64位）对齐。 Padding needs to be inserted into the struct to make sure that the storage for the type ends up at an address that is properly aligned. 需要将填充插入到结构中以确保该类型的存储最终位于正确对齐的地址。

You may see a difference in padding between 32 bit and 64 bit when using built-in datatypes that have different sizes on both architectures, for instance pointer types ( int* ). 当使用在两种体系结构上具有不同大小的内置数据类型时，您可能会看到32位和64位之间填充的差异，例如指针类型（ int* ）。

Answer 2

This is a matter of alignment requirement of the data type as specified in Padding and Alignment of Structure Members 这是结构成员的填充和对齐中指定的数据类型的对齐要求的问题

Every data object has an alignment-requirement. 每个数据对象都有一个对齐要求。 The alignment-requirement for all data except structures, unions, and arrays is either the size of the object or the current packing size (specified with either /Zp or the pack pragma, whichever is less). 除结构，联合和数组之外的所有数据的对齐要求是对象的大小或当前打包大小 （使用/Zp或pack pragma指定，以较小者为准）。

And the default value for structure member alignment is specified in /Zp (Struct Member Alignment) 并且/ Zp（结构成员对齐）中指定了结构成员对齐的默认值

The available packing values are described in the following table: 可用的包装值如下表所述：

/ Zp argument Effect / Zp参数效果
1 Packs structures on 1-byte boundaries. 1个包含1字节边界的结构。 Same as /Zp. 与/ Zp相同。
2 Packs structures on 2-byte boundaries. 2在2字节边界上打包结构。
4 Packs structures on 4-byte boundaries. 4个字节边界上的4个结构。
8 Packs structures on 8-byte boundaries (default for x86, ARM, and ARM64). 8个包含8字节边界的结构（x86，ARM和ARM64的默认设置）。
16 Packs structures on 16-byte boundaries (default for x64). 16个包含16字节边界的结构（x64的默认值）。

Since the default for x86 is /Zp8 which is 8 bytes, the output is 16. 由于x86的默认值为/ Zp8，即8字节，因此输出为16。

However, you can specify a different packing size with /Zp option. 但是，您可以使用/Zp选项指定不同的包装尺寸。
Here is a Live Demo with /Zp4 which gives the output as 12 instead of 16. 这是一个带/Zp4的现场演示，输出为12而不是16。

Answer 3

Size and alignof() (minimum alignment that any object of that type must have) for each primitive type is an ABI ¹ design choice separate from the register width of the architecture. 每种基本类型的大小和alignof() （该类型的任何对象必须具有的最小对齐）是与体系结构的寄存器宽度分开的ABI ¹设计选择。

Struct-packing rules can also be more complicated than just aligning each struct member to its minimum alignment inside the struct; 结构包装规则也可能比仅将每个结构成员对齐到结构内部的最小对齐更复杂; that's another part of the ABI. 这是ABI的另一部分。

MSVC targeting 32-bit x86 gives __int64 a minimum alignment of 4, but its default struct-packing rules align types within structs to min(8, sizeof(T)) relative to the start of the struct. 针对32位x86的MSVC为__int64提供了4的最小对齐，但其默认的struct-packing规则将结构中的类型与结构的开头的min(8, sizeof(T))对齐。 (For non-aggregate types only). （仅适用于非聚合类型）。 That's not a direct quote, that's my paraphrase of the MSVC docs link from @PW's answer, based on what MSVC seems to actually do. 这不是一个直接的引用，这是我对MSVC文档链接的解释，来自@PW的答案，基于MSVC实际上做的事情。 (I suspect the "whichever is less" in the text is supposed to be outside the parens, but maybe they're making a different point about the interaction on the pragma and the command-line option?) （我怀疑文本中的“以较小者为准”应该是在parens之外，但也许他们对pragma和命令行选项上的交互有不同的看法？）

(An 8-byte struct containing a char[8] still only gets 1-byte alignment inside another struct, or a struct containing an alignas(16) member still gets 16-byte alignment inside another struct.) （包含char[8]的8字节结构仍然只在另一个结构中获得1字节对齐，或者包含alignas(16)成员的结构仍然在另一个结构内部获得16字节对齐。）

Note that ISO C++ doesn't guarantee that primitive types have alignof(T) == sizeof(T) . 请注意，ISO C ++不保证基本类型具有alignof(T) == sizeof(T) 。 Also note that MSVC's definition of alignof() doesn't match the ISO C++ standard: MSVC says alignof(__int64) == 8 , but some __int64 objects have less than that alignment ² . 另请注意，MSVC对alignof()的定义与ISO C ++标准不匹配：MSVC表示alignof(__int64) == 8 ，但有些__int64对象的对齐小于对齐² 。

So surprisingly, we get extra padding even though MSVC doesn't always bother to make sure the struct itself has any more than 4-byte alignment , unless you specify that with alignas() on the variable, or on a struct member to imply that for the type. 令人惊讶的是，我们得到额外的填充，即使MSVC并不总是费心去确保结构本身具有任何超过4字节的对齐 ，除非您在变量上指定了alignas() ，或者在结构成员上指定了对于类型。 (eg a local struct Z tmp on the stack inside a function will only have 4-byte alignment, because MSVC doesn't use extra instructions like and esp, -8 to round the stack pointer down to an 8-byte boundary.) （例如，函数内堆栈上的局部struct Z tmp只有4字节对齐，因为MSVC不使用and esp, -8这样的额外指令将堆栈指针向下舍入到8字节边界。）

However, new / malloc does give you 8-byte-aligned memory in 32-bit mode, so this makes a lot of sense for dynamically-allocated objects (which are common) . 但是， new / malloc确实在32位模式下为您提供8字节对齐的内存，因此这对动态分配的对象（这是常见的）很有意义 。 Forcing locals on the stack to be fully aligned would add cost to align the stack pointer, but by setting struct layout to take advantage of 8-byte-aligned storage, we get the advantage for static and dynamic storage. 强制堆栈上的本地对象完全对齐会增加对齐堆栈指针的成本，但通过设置struct layout以利用8字节对齐的存储，我们可以获得静态和动态存储的优势。

This might also be designed to get 32 and 64-bit code to agree on some struct layouts for shared memory. 这也可能旨在获得32位和64位代码，以便就共享内存的某些结构布局达成一致。 (But note that the default for x86-64 is min(16, sizeof(T)) , so they still don't fully agree on struct layout if there are any 16-byte types that aren't aggregates (struct/union/array) and don't have an alignas .) （但请注意，x86-64的默认值为min(16, sizeof(T)) ，因此如果有任何16字节类型不是聚合，它们仍然不完全同意struct layout（struct / union / array）并且没有alignas 。）

The minimum absolute alignment of 4 comes from the 4-byte stack alignment that 32-bit code can assume. 4的最小绝对对齐来自32位代码可以假设的4字节堆栈对齐。 In static storage, compilers will choose natural alignment up to maybe 8 or 16 bytes for vars outside of structs, for efficient copying with SSE2 vectors. 在静态存储中，编译器将为结构外部的变量选择自然对齐，最多可能为8或16个字节，以便使用SSE2向量进行有效复制。

In larger functions, MSVC may decide to align the stack by 8 for performance reasons, eg for double vars on the stack which actually can be manipulated with single instructions, or maybe also for int64_t with SSE2 vectors. 在较大的函数中，出于性能原因，MSVC可以决定将堆栈对齐8，例如，堆栈上的double变量实际上可以用单个指令操作，或者也可以用于具有SSE2向量的int64_t 。 See the Stack Alignment section in this 2006 article: Windows Data Alignment on IPF, x86, and x64 . 请参阅2006年文章中的“ 堆栈对齐”部分： IPF，x86和x64上的Windows数据对齐。 So in 32-bit code you can't depend on an int64_t* or double* being naturally aligned. 因此，在32位代码中，您不能依赖于int64_t*或double*自然对齐。

(I'm not sure if MSVC will ever create even less aligned int64_t or double objects on its own. Certainly yes if you use #pragma pack 1 or -Zp1 , but that changes the ABI. But otherwise probably not, unless you carve space for an int64_t out of a buffer manually and don't bother to align it. But assuming alignof(int64_t) is still 8, that would be C++ undefined behaviour.) （我不确定MSVC是否会创建更少对齐的int64_t或double对象。肯定是的，如果你使用#pragma pack 1或-Zp1 ，但是这会改变ABI。但是否则可能不会，除非你雕刻空间对于手动缓冲区中的int64_t并且不需要对齐它。但假设alignof(int64_t)仍为8，那将是C ++未定义的行为。）

If you use alignas(8) int64_t tmp , MSVC emits extra instructions to and esp, -8 . 如果你使用alignas(8) int64_t tmp ，MSVC会向alignas(8) int64_t tmp and esp, -8发出额外的指令。 If you don't, MSVC doesn't do anything special, so it's luck whether or not tmp ends up 8-byte aligned or not. 如果你不这样做，MSVC没有做任何特别的事情，所以无论tmp是否以8字节对齐结束tmp幸运。

Other designs are possible, for example the i386 System V ABI (used on most non-Windows OSes) has alignof(long long) = 4 but sizeof(long long) = 8 . 其他设计也是可能的，例如i386 System V ABI（在大多数非Windows操作系统上使用）具有alignof(long long) = 4但sizeof(long long) = 8 。 These choices 这些选择

Outside of structs (eg global vars or locals on the stack), modern compilers in 32-bit mode do choose to align int64_t to an 8-byte boundary for efficiency (so it can be loaded / copied with MMX or SSE2 64-bit loads, or x87 fild to do int64_t -> double conversion). 在结构体之外（例如，堆栈上的全局变量或局部变量），32位模式下的现代编译器确实选择将int64_t与8字节边界对齐以提高效率（因此可以使用MMX或SSE2 64位加载来加载/复制它，或x87 fild做int64_t - >双转换）。

This is one reason why modern version of the i386 System V ABI maintain 16-byte stack alignment: so 8-byte and 16-byte aligned local vars are possible. 这就是现代版i386 System V ABI保持16字节堆栈对齐的原因之一：因此可以实现8字节和16字节对齐的本地变量。

When the 32-bit Windows ABI was being designed, Pentium CPUs were at least on the horizon. 当设计32位Windows ABI时，奔腾CPU至少还在眼前。 Pentium has 64-bit wide data busses, so its FPU really can load a 64-bit double in a single cache access if it's 64-bit aligned. Pentium具有64位宽的数据总线，因此如果它的64位对齐，它的FPU实际上可以在单个高速缓存访问中加载64位double 。

Or for fild / fistp , load/store a 64-bit integer when converting to/from double . 或者对于fild / fistp ，在转换为/从double时加载/存储64位整数。 Fun fact: naturally aligned accesses up to 64 bits are guaranteed atomic on x86, since Pentium: Why is integer assignment on a naturally aligned variable atomic on x86? 有趣的事实：自然对齐的访问最多64位在x86上保证原子，因为奔腾：为什么在x86 上自然对齐的变量原子上的整数赋值？

Footnote 1 : An ABI also includes a calling convention, or in the case of MS Windows, a choice of various calling conventions which you can declare with function attributes like __fastcall ), but the sizes and alignment-requirements for primitive types like long long are also something that compilers have to agree on to make functions that can call each other. 脚注1 ：ABI还包括一个调用约定，或者在MS Windows的情况下，可以选择各种调用约定，你可以用__fastcall等函数属性声明，但是像long long这样的基本类型的大小和对齐要求是也是编译器必须同意制作可以相互调用的函数的东西。 (The ISO C++ standard only talks about a single "C++ implementation"; ABI standards are how "C++ implementations" make themselves compatible with each other.) （ISO C ++标准仅涉及单个“C ++实现”; ABI标准是“C ++实现”如何使它们彼此兼容。）

Note that struct-layout rules are also part of the ABI : compilers have to agree with each other on struct layout to create compatible binaries that pass around structs or pointers to structs. 请注意，struct-layout规则也是ABI的一部分 ：编译器必须在struct layout上相互一致，以创建传递结构或指向结构的指针的兼容二进制文件。 Otherwise sx = 10; foo(&x); 否则sx = 10; foo(&x); sx = 10; foo(&x); might write to a different offset relative to the base of the struct than separately-compiled foo() (maybe in a DLL) was expecting to read it at. 可能写入相对于结构基础的不同偏移量而不是单独编译的foo() （可能在DLL中）期望读取它。

Footnote 2 : 脚注2 ：

GCC had this C++ alignof() bug, too, until it was fixed in 2018 for g++8 some time after being fixed for C11 _Alignof() . GCC也有这个C ++ alignof()错误，直到它被修复为C11 _Alignof() 在2018年为g ++ 8修复了一段时间。 See that bug report for some discussion based on quotes from the standard which conclude that alignof(T) should really report the minimum guaranteed alignment you can ever see, not the preferred alignment you want for performance. 根据标准中的引用查看该错误报告，该标准得出结论： alignof(T)应该真正报告您可以看到的最小保证对齐， 而不是您想要的性能首选对齐。 ie that using an int64_t* with less than alignof(int64_t) alignment is undefined behaviour. 即使用小于alignof(int64_t)对齐的int64_t*是未定义的行为。

(It will usually work fine on x86, but vectorization that assumes a whole number of int64_t iterations will reach a 16 or 32-byte alignment boundary can fault. See Why does unaligned access to mmap'ed memory sometimes segfault on AMD64? for an example with gcc.) （它通常可以在x86上正常工作，但是假设整个int64_t迭代的矢量化将达到16或32字节的对齐边界可能会出错。请参阅为什么对mmap的内存进行未对齐访问有时会在AMD64上出现段错误？与gcc。）

The gcc bug report discusses the i386 System V ABI, which has different struct-packing rules than MSVC: based on minimum alignment, not preferred. gcc bug报告讨论了i386 System V ABI，它具有与MSVC不同的结构包装规则：基于最小对齐，不是首选。 But modern i386 System V maintains 16-byte stack alignment, so it's only inside structs (because of struct-packing rules that are part of the ABI) that the compiler ever creates int64_t and double objects that are less than naturally aligned. 但是现代的i386 System V维护了16字节的堆栈对齐，所以它只是内部结构（因为结构包装规则是ABI的一部分），编译器创建的int64_t和double对象不是自然对齐的。 Anyway, that's why the GCC bug report was discussing struct members as the special case. 无论如何，这就是GCC错误报告讨论结构成员作为特例的原因。

Kind of the opposite from 32-bit Windows with MSVC where the struct-packing rules are compatible with an alignof(int64_t) == 8 but locals on the stack are always potentially under-aligned unless you use alignas() to specifically request alignment. 与具有MSVC的32位Windows相反，其中struct-packing规则与alignof(int64_t) == 8兼容，但堆栈上的locals总是可能未完全对齐，除非您使用alignas()来专门请求对齐。

32-bit MSVC has the bizarre behaviour that alignas(int64_t) int64_t tmp is not the same as int64_t tmp; 32位MSVC具有奇怪的行为，即alignas(int64_t) int64_t tmp与int64_t tmp; , and emits extra instructions to align the stack . ，并发出额外的指令来对齐堆栈 。 That's because alignas(int64_t) is like alignas(8) , which is more aligned than the actual minimum. 那是因为alignas(int64_t)就像alignas(8) ，它比实际的最小值更对齐。

void extfunc(int64_t *);

void foo_align8(void) {
    alignas(int64_t) int64_t tmp;
    extfunc(&tmp);
}

(32-bit) x86 MSVC 19.20 -O2 compiles it like so ( on Godbolt , also includes 32-bit GCC and the struct test-case): （32位）x86 MSVC 19.20 -O2像这样编译它（ 在Godbolt上 ，还包括32位GCC和struct测试用例）：

_tmp$ = -8                                          ; size = 8
void foo_align8(void) PROC                       ; foo_align8, COMDAT
        push    ebp
        mov     ebp, esp
        and     esp, -8                             ; fffffff8H  align the stack
        sub     esp, 8                                  ; and reserve 8 bytes
        lea     eax, DWORD PTR _tmp$[esp+8]             ; get a pointer to those 8 bytes
        push    eax                                     ; pass the pointer as an arg
        call    void extfunc(__int64 *)           ; extfunc
        add     esp, 4
        mov     esp, ebp
        pop     ebp
        ret     0

But without the alignas() , or with alignas(4) , we get the much simpler 但是如果没有alignas()或者使用alignas(4) ，我们就会变得更加简单

_tmp$ = -8                                          ; size = 8
void foo_noalign(void) PROC                                ; foo_noalign, COMDAT
        sub     esp, 8                             ; reserve 8 bytes
        lea     eax, DWORD PTR _tmp$[esp+8]        ; "calculate" a pointer to it
        push    eax                                ; pass the pointer as a function arg
        call    void extfunc(__int64 *)           ; extfunc
        add     esp, 12                             ; 0000000cH
        ret     0

It could just push esp instead of LEA/push; 它可以push esp而不是LEA / push; that's a minor missed optimization. 这是次要的错过优化。

Passing a pointer to a non-inline function proves that it's not just locally bending the rules. 将指针传递给非内联函数证明它不仅仅是局部弯曲规则。 Some other function that just gets an int64_t* as an arg has to deal with this potentially under-aligned pointer, without having gotten any information about where it came from. 一些其他函数只是获取一个int64_t*作为一个arg必须处理这个潜在的欠对齐指针，而没有得到任何关于它来自何处的信息。

If alignof(int64_t) was really 8, that function could be hand-written in asm in a way that faulted on misaligned pointers. 如果alignof(int64_t) 实际上是 8，那么该函数可以在asm中以错误指针的方式编写。 Or it could be written in C with SSE2 intrinsics like _mm_load_si128() that require 16-byte alignment, after handling 0 or 1 elements to reach an alignment boundary. 或者可以用C语言编写SSE2内在函数，如_mm_load_si128() ，在处理0或1个元素到达对齐边界后需要16字节对齐。

But with MSVC's actual behaviour, it's possible that none of the int64_t array elements are aligned by 16, because they all span an 8-byte boundary. 但是对于MSVC的实际行为，有可能没有任何int64_t数组元素被16对齐，因为它们都跨越了8字节的边界。

BTW, I wouldn't recommend using compiler-specific types like __int64 directly. 顺便说一句，我不建议直接使用像__int64这样的编译器特定类型。 You can write portable code by using int64_t from <cstdint> , aka <stdint.h> . 您可以使用<cstdint> int64_t编写可移植代码，也就是<stdint.h> 。

In MSVC, int64_t will be the same type as __int64 . 在MSVC中， int64_t与__int64类型相同。

On other platforms, it will typically be long or long long . 在其他平台上，它通常会long或long long 。 int64_t is guaranteed to be exactly 64 bits with no padding, and 2's complement, if provided at all. int64_t保证正好是64位，没有填充，2是补码，如果提供的话。 (It is by all sane compilers targeting normal CPUs. C99 and C++ require long long to be at least 64-bit, and on machines with 8-bit bytes and registers that are a power of 2, long long is normally exactly 64 bits and can be used as int64_t . Or if long is a 64-bit type, then <cstdint> might use that as the typedef.) （所有理智的编译器都是针对正常的CPU .C99和C ++需要long long才能至少为64位，而在具有8位字节和2的幂的寄存器上， long long通常正好是64位且可以用作int64_t 。或者如果long是64位类型，那么<cstdint>可能会将其用作typedef。）

I assume __int64 and long long are the same type in MSVC, but MSVC doesn't enforce strict-aliasing anyway so it doesn't matter whether they're the exact same type or not, just that they use the same representation. 我假设__int64和long long在MSVC中是相同的类型，但MSVC无论如何都不强制执行严格别名，因此它们是否完全相同并不重要，只是它们使用相同的表示。

Answer 4

A struct's alignment is the size of its largest member. 结构的对齐是其最大成员的大小。

That means if you have an 8-byte(64bit) member in the struct, then the struct will align to 8 bytes. 这意味着如果结构中有一个8字节（64位）的成员，那么结构将对齐到8个字节。

In the case that you are describing, if the compiler allows the struct to align to 4 bytes, it possibly leads to an 8-byte member lying across the cache line boundary. 在您描述的情况下，如果编译器允许结构对齐到4个字节，则可能会导致一个8字节的成员位于缓存行边界。

Say we have a CPU that has a 16-byte cache line. 假设我们有一个具有16字节高速缓存行的CPU。 Consider a struct like this: 考虑这样的结构：

struct Z
{
    char s;      // 1-4 byte
    __int64 i;   // 5-12 byte
    __int64 i2;  // 13-20 byte, need two cache line fetches to read this variable
};

为什么32位和64位系统上的“对齐”相同？

问题描述

4 个解决方案

解决方案1
12 2019-04-30 11:43:48

解决方案2
8 2019-04-30 11:57:05

解决方案3
4 已采纳 2019-05-01 01:33:19

解决方案4
-2 2019-04-30 15:02:01

为什么32位和64位系统上的“对齐”相同？

问题描述

4 个解决方案

解决方案1 12 2019-04-30 11:43:48

解决方案2 8 2019-04-30 11:57:05

解决方案3 4 已采纳 2019-05-01 01:33:19

解决方案4 -2 2019-04-30 15:02:01

解决方案1
12 2019-04-30 11:43:48

解决方案2
8 2019-04-30 11:57:05

解决方案3
4 已采纳 2019-05-01 01:33:19

解决方案4
-2 2019-04-30 15:02:01