简体   繁体   English

获取处理器的 memory 粒度

[英]Get memory granularity of a processor

How to get the memory granularity of a CPU in C?如何获得C中CPU的memory粒度

Suppose I want to allocate an array where all the elements are properly memory aligned.假设我想分配一个数组,其中所有元素都正确对齐 memory。 I can pad each element to a certain size N to achieve this.我可以将每个元素填充到一定大小 N 来实现这一点。 How do I know the value of N?我怎么知道N的值?

Note : I am trying to create a memory pool where each slot is memory aligned.注意:我正在尝试创建一个 memory 池,其中每个插槽都是 memory 对齐的。 Any suggestion will be appreciated.任何建议将不胜感激。

In Theory理论上

How to get the memory granularity of a CPU in C?如何获得C中CPU的memory粒度?

First, you read the instruction set architecture manual.首先,您阅读指令集架构手册。 It may specify that certain instructions require certain alignments, or even that the addressing forms in certain instructions cannot represent non-aligned addresses.它可能指定某些指令需要某些对齐,甚至某些指令中的寻址 forms 不能表示非对齐地址。 It may specify other properties regarding alignment.它可以指定有关 alignment 的其他属性。

Second, you read the processor manual.其次,您阅读处理器手册。 It may specify performance characteristics (such as that unaligned loads or stores are supported but may be slower or use more resources than aligned loads or stores) and may specify various options allowed by the instructions set architecture.它可以指定性能特征(例如支持未对齐的加载或存储,但可能比对齐的加载或存储更慢或使用更多资源),并且可以指定指令集架构允许的各种选项。

Third, you read the operating system documentation.第三,您阅读操作系统文档。 Some architectures allow the operating system to select features related to alignment, such as whether unaligned loads and stores are made to fail or are supported albeit with slower performance than aligned loads or stores.某些体系结构允许操作系统使用与 alignment 相关的 select 功能,例如是否使未对齐的加载和存储失败或受支持,尽管性能比对齐的加载或存储慢。 The operating system documentation should have this information.操作系统文档应包含此信息。

In Practice在实践中

For many programming situations, what you need to know is not the “memory granularity” of a CPU but the alignment requirements of the C implementation you are using (or of whatever language you are using).对于许多编程情况,您需要知道的不是 CPU 的“内存粒度”,而是您正在使用的 C 实现(或您使用的任何语言)的 alignment 要求。 And, for the most part, you do not need to know the alignment requirements directly but just need to follow the language rules about managing objects—use objects with declared types, do not use casts to convert pointers between incompatible types exceed where specific rules allow it, use the suitably aligned memory as provided by malloc rather than adjusting your own pointers to bytes, and so on.而且,在大多数情况下,您不需要直接了解 alignment 要求,而只需遵循有关管理对象的语言规则——使用具有声明类型的对象,不要使用强制转换来转换不兼容类型之间的指针超出特定规则允许的范围它,使用 malloc 提供的适当对齐的malloc而不是调整自己的指向字节的指针,依此类推。 Following these rules will give good alignment for the objects in your program.遵循这些规则将为程序中的对象提供良好的 alignment。

In C, when you define an array, the element size will automatically be the size that C implementation needs for its alignment.在 C 中,当您定义一个数组时,元素大小将自动成为 C 实现对其 alignment 所需的大小。 For example, long double x[100];例如, long double x[100]; may use 16 bytes for each array element even though the hardware uses only ten bytes for a long double .即使硬件对于long double仅使用 10 个字节,也可以为每个数组元素使用 16 个字节。 Or, for any struct foo that you define, the compiler will automatically include padding as needed in the structure to give the desired alignment, and any array struct foo x[100];或者,对于您定义的任何struct foo ,编译器将根据需要在结构中自动包含填充,以提供所需的 alignment 和任何数组struct foo x[100]; will already include that padding.将已经包含该填充。 sizeof(struct foo) will be the same as sizeof x[0] , because each structure object has that padding built in, even just for a single structure object, not just for elements in arrays. sizeof(struct foo)将与sizeof x[0]相同,因为每个结构 object 都有内置的填充,即使只是针对单个结构 object,而不仅仅是针对 ZA3CBC3F9D0CE2F2C1554E1B671D71 中的元素。

When you do need to know the alignment that a C implementation requires for a type, you can use C's _Alignof operator.当您确实需要了解 C 实现对类型所需的 alignment 时,可以使用 C 的_Alignof运算符。 The expression _Alignof(type) provides the alignment required for type .表达式_Alignof(type)提供了type所需的 alignment。

Other其他

… properly memory aligned. … 正确对齐 memory。

Proper alignment is a matter of degrees:正确的 alignment 是度数问题:

  • What the processor supports may determine whether your program works or does not work.处理器支持的内容可能决定您的程序是否有效。 An improper alignment is one that causes your program to trap.不正确的 alignment 会导致您的程序陷入陷阱。
  • What is efficient with respect to individual loads and stores may affect how fast your program runs.对于单个加载和存储而言,什么是有效的可能会影响程序运行的速度。 An improper alignment is one that causes your program to execute more slowly.不正确的 alignment 会导致您的程序执行更慢。
  • In certain performance-critical situations, alignment with respect to cache and memory mapping features can also affect performance.在某些性能关键的情况下,alignment 与缓存和 memory 映射功能相关也会影响性能。

Short answer简短的回答

Use 64 bytes.使用 64 字节。

Long answer长答案

Data are loaded from and stored to memory in units called cache lines .数据以称为高速缓存行的单元从 memory 加载并存储到其中。 If your program loads only part of the data in a cache line, then the whole line will be loaded into the CPU caches.如果您的程序仅在缓存行中加载部分数据,则整行将被加载到 CPU 缓存中。 Perhaps more importantly, the algorithm used for moving data between cores in a multi-core CPU operates on full cache lines;也许更重要的是,用于在多核 CPU 中的核之间移动数据的算法在完整的高速缓存行上运行。 aligning your data to cache lines avoids false sharing , the situation where a cache line bounces between cores because it contains data manipulated by different threads.将数据与缓存行对齐可以避免错误共享,即缓存行在内核之间反弹的情况,因为它包含由不同线程操作的数据。

It used to be the case that cache lines depended on the architecture, ranging from 16 up to 512 bytes.过去,缓存行取决于架构,从 16 字节到 512 字节不等。 However, all current processors (Intel, AMD, ARM, MIPS) use a cache line of 64 bytes.但是,所有当前的处理器(Intel、AMD、ARM、MIPS)都使用 64 字节的高速缓存行。

This depends heavily on the cpu microarchitecture that you are using.这在很大程度上取决于您使用的 cpu 微架构。

In many cases, the memory address of an operator should be a multiple of the operand size, otherwise execution will be slow (or even might throw an exception).在很多情况下,操作符的 memory 地址应该是操作数大小的倍数,否则执行会很慢(甚至可能抛出异常)。

But there are also CPUs which do not care about a specific alignment of the operands in memory at all.但也有一些 CPU 根本不关心 memory 中操作数的特定 alignment。

Usually, the C compiler will care about those details for you.通常,C 编译器会为您关心这些细节。 You should, however, make sure that the compiler assumes the correct target (micro-)architecture, for example by specifying it with the correct compiler flags ( -march=? on gcc).但是,您应该确保编译器采用正确的目标(微)架构,例如通过使用正确的编译器标志(gcc 上的-march=? )指定它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM