简体   繁体   English

遍历数组的最有效方法是什么? (C ++)

[英]What is the most efficient way to loop over an array? (c++)

This is sort of a silly question, but it's been bothering me and I couldn't google-fu my way over it. 这是一个愚蠢的问题,但这一直困扰着我,我无法以谷歌的方式解决它。

Consider the following array: 考虑以下数组:

struct SomeDataStruct
{
    uint64_t ValueOne;
    uint64_t ValueTwo;
    uint64_t ValueThree;
};

SomeDataStruct _veryLargeArray[1024];

Now, which of these approaches are faster to loop over every element and do something with each one? 现在,以下哪种方法可以更快地遍历每个元素并对每个元素执行某项操作?

Approach 1: 方法1:

for (int i = 0; i < 1024; ++i)
{
    _veryLargeArray[i].ValueOne += 1;
    _veryLargeArray[i].ValueTwo += 1;
    _veryLargeArray[i].ValueThree = _veryLargeArray[i].ValueOne + _veryLargeArray[i].ValueTwo;
}

Approach 2: 方法二:

SomeDataStruct * pEndOfStruct = &(_veryLargeArray[1024]);

for (SomeDataStruct * ptr = _veryLargeArray; ptr != pEndOfStruct; ptr += 1)
{
    ptr->ValueOne += 1;
    ptr->ValueTwo += 1;
    ptr->ValueThree = ptr->ValueOne + ptr->ValueTwo;
}

I know the question seems really stupid on its surface, but what I'm wondering is does the compiler do anything smart/special with each given way of implementing the for loop? 我知道这个问题从表面上看确实很愚蠢,但是我想知道的是,编译器是否以每种给定的实现for循环的方式进行智能/特殊处理? In the first case, it could be really memory intensive if the compiler actually looked up BaseArrayPointer + Offset every time, but if the compiler is smart enough if will fill the L2 cache with the entire array and treat the code between the { }'s correctly. 在第一种情况下,如果编译器每次实际上都查找BaseArrayPointer + Offset,则可能确实会占用大量内存,但是如果编译器足够聪明,它将用整个数组填充L2缓存并处理{}之间的代码正确。

The second way gets around if the compiler is resolving the pointer every time, but probably makes it real hard for a compiler to figure out that if could copy the entire array into the L2 cache and walk it there. 如果编译器每次都解析该指针,则第二种方法得到解决,但是对于编译器而言,要弄清楚是否可以将整个数组复制到L2高速缓存中并在其中遍历,可能真的很困难。

Sorry for such a silly question, I'm having a lot of fun learning c++ and have started doing that thing where you overthink everything. 抱歉,这个愚蠢的问题使我在学习c ++的过程中获得了很多乐趣,并开始做那件事,而您却对此无所适从。 Just curious if anyone knew if there was a "definitive" answer. 只是好奇是否有人知道是否有“确定的”答案。

Unless you want to look at the intermediate assembly language output and analyze the caching behaviour of the CPU, the only way you'll be able to answer this question is to profile the code. 除非您要查看中间汇编语言输出并分析CPU的缓存行为,否则您能够回答此问题的唯一方法是分析代码。 Run it, hundreds or thousands of times and see how long it takes. 运行数百或数千次,然后查看需要多长时间。

If you want the fastest code, write the simplest, most obvious version and leave it to the optimizing compiler. 如果需要最快的代码,请编写最简单,最明显的版本,然后交给优化的编译器。 If you try to get fancy, with a loop like this, you risk confusing the compiler and it won't be able to optimize things. 如果尝试通过这样的循环来花哨的话,则有可能使编译器混乱,并且它无法进行优化。

I've seen a simple C loop compile to be faster than hand-coded assembly, and a hand-optimized C version that ended up slower than the hand-coded assembly. 我已经看到一个简单的C循环编译比手工编码的程序集要快,而经过手工优化的C版本的编译结果要比手工编码的程序集慢。

On the other hand it can pay to know a bit about caching and what is going on under the hood. 另一方面,可能需要花一点时间了解缓存以及引擎盖下发生的事情。 But usually, that happens after you've discovered that your code isn't fast enough. 但是通常,这是在您发现代码不够快之后发生的。 Doing otherwise risks premature optimization, which is the root of all evil . 否则,就有过早优化的风险,这是万恶之源

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM