简体   繁体   English

访问堆对象与堆栈对象的C ++性能

[英]C++ performance of accessing heap vs. stack objects

I know using heap objects are slower because of necessary memory management(allocation, deallocation). 我知道使用堆对象的速度较慢,因为必须进行内存管理(分配,释放)。 What about accessing them? 如何访问它们? Is there any performance difference while accessing an object on stack vs heap? 在堆栈和堆上访问对象时,性能有什么不同吗?

EDIT: My question is not about allocation but accessing them. 编辑:我的问题not about allocation而是访问它们。 This includes memory location of stack vs heap, cache misses or any other variable that I am not aware of. 这包括堆栈与堆的内存位置,高速缓存未命中或我不知道的任何其他变量。 With a simple toy example: 举一个简单的玩具示例:

int stack_array[100];
int* heap_array = new int[100];
...
...
std::cout << stack_array[51]; // Any difference between these two statements
std::cout << heap_array[51]; // Any difference between these two statements

You probably won't notice any speed differences between stack and heap (dynamic memory), unless the physical memory is different. 除非物理内存不同,否则您可能不会注意到堆栈和堆(动态内存)之间的任何速度差异。

The access for an array is direct, regardless of the memory location. 数组的访问是直接的,与存储位置无关。 You can confirm this by looking at the assembly language generated by the compiler. 您可以通过查看编译器生成的汇编语言来确认这一点。

There could be a difference if the OS decides to use virtual memory for your arrays. 如果操作系统决定对阵列使用虚拟内存,则可能会有所不同。 This means that the OS could page chunks of your array to the hard drive and swap them out on demand. 这意味着操作系统可以将阵列的大块分页到硬盘驱动器,然后按需交换出来。

In most applications, if there is a physical difference (in terms of speed) between memory types, it will be negligible, in order of nanoseconds. 在大多数应用程序中,如果内存类型之间存在物理差异(在速度方面),则可以忽略不计(以纳秒为单位)。 For more computational intense applications (lots of data or need for speed), this could make a difference. 对于更多的计算密集型应用程序(大量数据或对速度的需求),这可能会有所作为。

However, there are other issues that make memory access a non-issue such as: 但是,还有其他一些问题使内存访问成为非问题,例如:

  • Disk I/O 磁盘I / O
  • Waiting for User Input 等待用户输入
  • Memory paging 内存分页
  • Sharing of the CPU with other applications or threads 与其他应用程序或线程共享CPU

All of the above items have an overhead that is usually an order of magnitude more than an access to a memory device. 以上所有项目的开销通常比访问存储设备大一个数量级。

The main reason for using dynamic memory instead of stack based is size. 使用动态内存而不是基于堆栈的主要原因是大小。 Stack memory is mainly used for passing arguments and storing return addresses. 堆栈存储器主要用于传递参数和存储返回地址。 Local variables that are not declared static will also be placed on the stack. 未声明为静态的局部变量也将放置在堆栈中。 Most programming environments give the stack area a smaller size. 大多数编程环境使堆栈区域的尺寸更小。 Larger items can be placed on the heap or declared as static (and placed in the same area as globals). 较大的项目可以放在堆上或声明为静态(并与全局变量放在同一区域)。

Worry more about correctness than memory performance. 比存储性能更担心正确性。 Profile when in doubt. 有疑问时进行简介。

Edit 1: Cache misses 编辑1:缓存未命中
A Cache Miss is when the processor looks in it's data cache and doesn't find the item; 高速缓存未命中是指处理器在其数据高速缓存中查找并找不到该项目时; the processor must then fetch the item from external memory (aka reloading the cache). 然后,处理器必须从外部存储器中获取项目(也就是重新加载缓存)。

For most applications, cache misses are negligible in performance, usually measured in small nanosecond values. 对于大多数应用程序,高速缓存未命中的性能可以忽略不计,通常以较小的纳秒级值衡量。 They are not noticeable unless your program is computationally intensive or processing a huge amount of data. 除非您的程序需要大量计算或要处理大量数据,否则它们不会引起注意。

Branch instructions will take up more execution time than a cache miss. 分支指令将比高速缓存未命中占用更多的执行时间。 Some conditional branch instructions may force the processor to reload the instruction pipeline and reload the Program Counter register. 一些条件分支指令可能会迫使处理器重新加载指令流水线并重新加载程序计数器寄存器。 (Note: some processors can haul in executable loop code into the instruction cache and reduce the penalty of the branch effects.) (注意:某些处理器可以将可执行循环代码拖入指令高速缓存中,并减少分支效果的损失。)

You can organize your data to reduce the amount of cache misses. 您可以组织数据以减少高速缓存未命中的数量。 Search the web for "data driven" or "data optimizations". 在网络上搜索“数据驱动”或“数据优化”。 Also try reducing the branches by applying algebra, Boolean Algebra, and factoring invariants out of loops. 还可以尝试通过应用代数,布尔代数以及将不变量分解为循环来减少分支。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM