简体   繁体   English

具有“高”内存使用率的直接索引访问与具有“低”内存使用率的“移位”索引访问的理论影响是什么?

[英]What is the theoretical impact of direct index access with “high” memory usage vs. “shifted” index access with “low” memory usage?

Well I am really curious as to what practice is better to keep, I know it (probably?) does not make any performance difference at all (even in performance critical applications?) but I am more curious about the impact on the generated code with optimization in mind (and for the sake of completeness, also "performance", if it makes any difference). 好吧,我真的很好奇哪种做法最好保留,我知道(可能是?)根本没有任何性能差异(即使在性能至关重要的应用程序中也是如此),但是我对使用以下方式对生成的代码的影响更感到好奇考虑到优化(为了保持完整性,也请考虑“性能”,如果有区别的话)。

So the problem is as following: 因此,问题如下:

element indexes range from A to B where A > 0 and B > A (eg, A = 1000 and B = 2000). 元素索引的范围从A到B,其中A> 0且B> A(例如A = 1000和B = 2000)。

To store information about each element there are a few possible solutions, two of those which use plain arrays include direct index access and access by manipulating the index: 为了存储有关每个元素的信息,有几种可能的解决方案,其中两种使用纯数组的解决方案包括直接索引访问和通过操纵索引进行访问:

example 1 例子1

//declare the array with less memory, "just" 1000 elements, all elements used
std::array<T, B-A> Foo;
//but make accessing by index slower?
//accessing index N where B > N >= A
Foo[N-A];

example 2 例子2

//or declare the array with more memory, 2000 elements, 50% elements not used, not very "efficient" for memory
std::array<T, B> Foo;
//but make accessing by index faster?
//accessing index N where B > N >= A
Foo[N];

I'd personally go for #2 because I really like performance, but I think in reality: 我个人会选择第二名,因为我真的很喜欢表演,但实际上我认为:

  • the compiler will take care of both situations? 编译器会照顾这两种情况吗?
  • What is the impact on optimizations? 对优化有什么影响?
  • What about performance? 性能如何?
  • does it matter at all? 有关系吗?
  • Or is this just the next "micro optimization" thing that no human being should worry about? 还是这是下一个没有人应该担心的“微优化”的事情?
  • Is there some Tradeoff ratio between memory usage : speed which is recommended? 内存使用之间是否存在权衡比率:建议使用哪种速度?

Accessing any array with an index involves adding an index multiplied by element size and adding it to the base-address of the array itself. 使用索引访问任何数组都需要添加一个索引乘以元素大小,然后将其添加到数组本身的基址中。

Since we are already adding one number to another, making the adjustment for foo[NA] could easily be done by adjusting the base-address down by N * sizeof(T) before adding A * sizeof(T) , rather than actually calculating (AN)*sizeof(T) . 由于我们已经在一个数字上加了一个数字,因此可以通过在添加A * sizeof(T)之前将基地址向下调整N * sizeof(T)来轻松地对foo[NA]进行调整,而不是实际计算(AN)*sizeof(T)

In other words, any decent compiler should comletely hide this subtraction, assuming it is a constant value. 换句话说,任何体面的编译器都应该完全隐藏该减法,前提是它是一个常数值。

If it's not a constant [say you are using std::vector instread of std::array , then you will indeed subtract A from N at some point in the code. 如果它不是一个常数说你正在使用std::vector的instread std::array ,那么你确实会减去AN在代码中的一些点。 It is still pretty cheap to do this. 这样做仍然很便宜。 Most modern processors can do this in one cycle with no latency for the result, so at worst adds a single clock-cycle to the access. 大多数现代处理器可以在一个周期内完成此操作,而不会导致结果延迟,因此,最坏的情况是会为访问添加单个时钟周期。

Of course, if the numbers are 1000-2000, probably makes really little difference in the whole scheme of things - either the total time to process that is nearly nothing, or it's a lot becuase you do complicated stuff. 当然,如果数字是1000-2000,则在整个方案中可能几乎没有什么区别-处理的总时间几乎为零,或者很多是因为您做复杂的事情。 But if you were to make it a million elements, offset by half a million, it may make the difference between a simple or complex method of allocating them, or some such. 但是,如果要使它成为一百万个元素,而又被一百万个元素抵消,则可能会在一种简单或复杂的分配方法之间有所不同,或者有些类似。

Also, as Hans Passant implies: Modern OS's with virutal memory handling, memory that isn't actually used doesn't get populated with "real memory". 同样,正如汉斯·帕桑(Hans Passant)所暗示的那样:具有虚拟内存处理能力的现代操作系统,未实际使用的内存不会填充“真实内存”。 At work I was investigating a strange crash on a board that has 2GB of RAM, and when viewing the memory usage, it showed that this one applciation had allocated 3GB of virtual memory. 在工作中,我正在调查一块具有2GB RAM的板上发生的奇怪崩溃,并且在查看内存使用情况时,它表明该应用已分配了3GB虚拟内存。 This board does not have a swap-disk (it's an embedded system). 该开发板没有交换磁盘(它是嵌入式系统)。 It turns out that some code was simply allocating large chunks of memory that wasn't filled with anything, and it only stopped working when it reached 3GB (32-bit processor, 3+1GB memory split between user/kernel space). 事实证明,某些代码只是分配了没有任何内容的大块内存,并且只有在达到3GB(32位处理器,在用户/内核空间之间分配3 + 1GB内存)后,它才停止工作。 So even for LARGE lumps of memory, if you only have half of it, it won't actually take up any RAM, if you do not actually access it. 因此,即使对于大块的内存,如果只有一半的内存,即使您没有实际访问它,它实际上也不会占用任何RAM。

As ALWAYS when it comes to performance, compilers and such, if it's important, do not trust "the internet" to tell you the answer. 就性能而言,总是很重要的一点,例如编译器等,不要相信“互联网”来告诉您答案。 Set up a test with the code you actually intend to use, using the actual compiler(s) and processor type(s) that you plan to produce your code with/for, and run benchmarks. 使用计划用于/用于生成代码并运行基准的实际编译器和处理器类型,使用您实际打算使用的代码来设置测试。 Some compiler may well have a misfeature (on processor type XYZ9278) that makes it produce horrible code for a case that most other compilers do this "with no overhead at all". 一些编译器可能具有错误的功能(在处理器类型XYZ9278上),这导致大多数其他编译器“根本没有开销”地执行此操作,从而产生可怕的代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM