简体   繁体   中英

Performance Impact of Nested Vectors vs. Contiguous Arrays

Have there been any reliable tests that clearly display the performance differences between accessing and writing to nested vectors versus C++'s built-in arrays? I've heard that using nested (multi-dimensional) vectors typically have some performance overhead as compared to accessing elements in a single array (where all elements are stored in contiguous memory), but this just all seems to be hypothetical to me. I have yet to see any tests that actually show these differences. Are they significant? I'm sure that it depends on the scenario, but as an inexperienced programmer, I'm not quite sure at what level these differences do become significant.

It definitely depends on the scenario, to the extent that I don't think it's possible to answer in a general way which approach is fastest. The fastest approach is going to be the one where the access patterns have the best data locality - which depends highly on the access pattern as well as how the structures are laid out in memory, which in the case of nested vectors is dependent on the allocator and probably varies quite a bit between compilers.

I'd follow the general rule of optimization, which is to first write things in the most straightforward way and then attempt optimization when you can prove there is a bottleneck.

Two things contribute to the runtime differences between nested and flattened arrays:
Caching behaviour and indirection

  • CPUs use a hierarchy of Caches to avoid accessing the RAM directly too frequently. This exploits the fact that most memory accesses are either contiguous or have a certain temporal locality , ie what was accessed recently will be accessed again soon.
    This means that if the innermost arrays of your nested array is rather large, you will notice small to no differences to a flat array if you iterate over the values in a contiguous fashion . This means that when iterating over a range of values, for flat arrays, your innermost loop should iterate over consecutive elements, for nested arrays, your innermost loop should iterate over the innermost array.
  • If however your access patterns are random, the most important difference in timing are indirections:
    For a flat array, you use something like A[(Z * M + Y) * N + X] , so you do 4 arithmetic operations and then a memory access.
    For a nested array, you use A[Z][Y][X] , so there are actually three interdependent memory accesses: You need to know A[Z] before you can access A[Z][Y] and so on. Because of the superscalar architecture of modern CPUs, operations that can be executed in parallel are especially efficient, interdependent operations not so much. So you have some arithmetic operations and a memory load on the one side and three interdependent loads on the other side, which is significantly slower. It might be possible that for nested arrays, the contents of A and also A[Z] for some values of Z can be found in the cache hierarchy, but if your nested array is sufficiently large, it will never fit completely into the cache, thus leading to multiple cache misses and memory loads (nested) instead of just a single cache miss and load (flat) for a single random access into the array.

Also see his question , especially the shorter answers below for a more detailed discussion of caching (my answer) and indirection (Peter's answer), which also provides an example where there are no noticeable differences between nested and flat arrays (after fixing the indexing bug of course ;) )

So if you want to know if there are significant runtime differences between them, my answer would be:

  • If you do random access, you will definitely notice the multiple indirections, thus leading to a larger runtime for nested arrays.

  • If you do contiguous access and use the correct ordering of the loops (innermost loop = innermost array/innermost index for flat array) and your innermost dimension of the multi-dimensional array is large enough, then the difference will be negligable, since the compiler will be able to move all indirections out of the innermost loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM