简体   繁体   English

C结构指针解除引用速度

[英]C structure pointer dereferencing speed

I have a question regarding the speed of pointer dereferencing. 我有一个关于指针解除引用速度的问题。 I have a structure like so: 我有这样的结构:

typedef struct _TD_RECT TD_RECT;
struct _TD_RECT {
  double left;
  double top;
  double right;
  double bottom;
};

My question is, which of these would be faster and why? 我的问题是,哪一个更快,为什么?


CASE 1: 情况1:

TD_RECT *pRect;
...
for(i = 0; i < m; i++)
{
   if(p[i].x < pRect->left) ...
   if(p[i].x > pRect->right) ...
   if(p[i].y < pRect->top) ...
   if(p[i].y > pRect->bottom) ...
}

CASE 2: 案例2:

TD_RECT *pRect;
double left = pRect->left;
double top = pRect->top;
double right = pRect->right;
double bottom = pRect->bottom;
...
for(i = 0; i < m; i++)
{
   if(p[i].x < left) ...
   if(p[i].x > right) ...
   if(p[i].y < top) ...
   if(p[i].y > bottom) ...
}

So in case 1, the loop is directly dereferencing the pRect pointer to obtain the comparison values. 因此,在第1种情况下,循环直接取消引用pRect指针以获取比较值。 In case 2, new values were made on the function's local space (on the stack) and the values were copied from the pRect to the local variables. 在第2种情况下,在函数的局部空间(堆栈上)上创建了新值,并将值从pRect复制到局部变量。 Through a loop there will be many comparisons. 通过循环将有许多比较。

In my mind, they would be equally slow, because the local variable is also a memory reference on the stack, but I'm not sure... 在我看来,它们同样会很慢,因为局部变量也是堆栈上的内存引用,但我不确定......

Also, would it be better to keep referencing p[] by index, or increment p by one element and dereference it directly without an index. 另外,最好继续通过索引引用p [],或者将p递增一个元素,并在没有索引的情况下直接取消引用它。

Any ideas? 有任何想法吗? Thanks :) 谢谢 :)

You'll probably find it won't make a difference with modern compilers. 您可能会发现它对现代编译器没有任何影响。 Most of them would probably perform common subexpresion elimination of the expressions that don't change within the loop. 他们中的大多数可能会执行常见的subexpresion消除循环内不会改变的表达式。 It's not wise to assume that there's a simple one-to-one mapping between your C statements and assembly code. 假设C语句和汇编代码之间存在简单的一对一映射,这是不明智的。 I've seen gcc pump out code that would put my assembler skills to shame. 我见过gcc泵出代码会让我的汇编技能感到羞耻。

But this is neither a C nor C++ question since the ISO standard doesn't mandate how it's done. 但这既不是C也不是C ++问题,因为ISO标准并未强制要求如何完成。 The best way to check for sure is to generate the assembler code with something like gcc -S and examine the two cases in detail. 检查确保的最佳方法是使用gcc -S生成汇编代码,并详细检查这两种情况。

You'll also get more return on your investment if you steer away from this sort of micro-optimisation and concentrate more on the macro level, such as algorithm selection and such. 如果你避开这种微优化并且更多地关注宏观层面,例如算法选择等,你也会获得更多的投资回报。

And, as with all optimisation questions, measure, don't guess! 并且,与所有优化问题一样, 测量,不要猜! There are too many variables which can affect it, so you should be benchmarking different approaches in the target environment, and with realistic data. 有太多的变量会影响它,因此您应该在目标环境中对不同的方法进行基准测试,并使用实际数据。

It is not likely to be a hugely performance critical difference. 它不太可能是一个巨大的性能关键差异。 You could profile doing each option multiple times and see. 您可以多次配置每个选项并查看。 Ensure you have your compiler optimisations set in the test. 确保在测试中设置了编译器优化。

With regards to storing the doubles, you might get some performance hit by using const. 关于存储双打,你可能会通过使用const获得一些性能。 How big is your array? 你的阵列有多大?

With regards to using pointer arithmetic, this can be faster, yes. 关于使用指针运算,这可能更快,是的。

You can instantly optimise if you know left < right in your rect (surely it must be). 如果你知道你的矩形中的<right(肯定是必须的),你可以立即进行优化。 If x < left it can't also be > right so you can put in an "else". 如果x <left它也不能>对,那么你可以放入“else”。

Your big optimisation, if there is one, would come from not having to loop through all the items in your array and not have to perform 4 checks on all of them. 你的大优化(如果有的话)来自于不必遍历数组中的所有项目而不必对所有项目执行4次检查。

For example, if you indexed or sorted your array on x and y, you would be able, using binary search, to find all values that have x < left and loop through just those. 例如,如果您在x和y上对数组建立索引或排序,则可以使用二进制搜索来查找x <left的所有值并循环遍历这些值。

I think the second case is likely to be faster because you are not dereferencing the pointer to pRect on every loop iteration. 我认为第二种情况可能会更快,因为您没有在每次循环迭代时取消引用指向pRect的指针。

Practically, a compiler doing optimisation may notice this and there might be no difference in the code that is generated, but the possibility of pRect being an alias of an item in p[] could prevent this. 实际上,执行优化的编译器可能会注意到这一点,并且生成的代码可能没有区别,但pRect作为p []中项目的别名的可能性可能会阻止这种情况。

优化编译器将看到结构访问是循环不变的,因此循环不变代码运动也是如此 ,使得两种情况看起来相同。

I will be surprised if even a totally non-optimized compile (- O0) will produce differentcode for the two cases presented. 如果即使是完全非优化的编译(-O0)也会为所呈现的两种情况产生不同的代码,我会感到惊讶。 In order to perform any operation on a modern processor, the data need to loaded into registers. 为了在现代处理器上执行任何操作,需要将数据加载到寄存器中。 So even when you declare automatic variables, these variables will not exist in main memory but rather in one of the processors floating point registers. 因此,即使您声明自动变量,这些变量也不会存在于主存储器中,而是存在于其中一个处理器浮点寄存器中。 This will be true even when you do not declare the variables yourself and therefore I expect no difference in generated machine code even for when you declare the temporary variables in your C++ code. 即使您没有自己声明变量也是如此,因此即使在C ++代码中声明临时变量时,我预计生成的机器代码也没有区别。

But as others have said, compile the code into assembler and see for yourself. 但正如其他人所说,将代码编译成汇编程序并亲自查看。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM