简体   繁体   English

什么是别名以及它如何影响性能?

[英]What is aliasing and how does it affect performance?

At the GoingNative event, during the Interactive Panel on Day2 at the 9 minute mark, Chandler Carruth says: 在GoingNative活动中,在第2天的交互式面板中 ,在9分钟时,Chandler Carruth说:

Pointers create aliasing problems. 指针会产生锯齿问题。 They slow down your binaries not speed them up. 他们放慢你的二进制文件速度而不加速它们。

What does this mean? 这是什么意思? Can this be illustrated with a (simple) example? 这可以用(简单)示例来说明吗?

Aliasing affects performance by preventing the compiler from doing certain optimizations. 别名通过阻止编译器进行某些优化来影响性能。 For example: 例如:

void foo(int *array,int *size,int *value) {
    for(int i=0;i<*size;++i) {
        array[i] = 2 * *value;
    }
}

Looking at this code you might expect that the compiler could load *value once outside the loop and then set every element in the array to that value very quickly. 查看此代码,您可能希望编译器可以在循环外部加载*value ,然后非常快速地将数组中的每个元素设置为该值。 But this isn't the case due to aliasing. 但由于混叠,情况并非如此。 Because *value could be an alias for an element of the array it could change on any given iteration. 因为*value可能是数组元素的别名,所以它可以在任何给定的迭代中更改。 Therefore the code has to load the value every single iteration, resulting in a potentially large slowdown. 因此,代码必须每次迭代加载值,从而导致潜在的大幅减速。

If the variables could not alias then the above code would be equivalent to the following: 如果变量不能别名,那么上面的代码将等同于以下内容:

void foo(int *array,int size,int value) {
    for(int i=0;i<size;++i) {
        array[i] = 2 * value;
    }
}

Using LLVM's online demo to get the generated code, here are the different results: 使用LLVM的在线演示来获取生成的代码,以下是不同的结果:

1) With aliasing 1)带别名

foo:                                    # @foo
    .cfi_startproc
# BB#0:
    cmpl    $0, (%rsi)
    jle .LBB0_3
# BB#1:
    xorl    %eax, %eax
    .align  16, 0x90
.LBB0_2:                                # %.lr.ph
                                        # =>This Inner Loop Header: Depth=1
    movl    (%rdx), %ecx
    addl    %ecx, %ecx
    movl    %ecx, (%rdi,%rax,4)
    incq    %rax
    cmpl    (%rsi), %eax
    jl  .LBB0_2
.LBB0_3:                                # %._crit_edge
    ret
    .size   foo, .Ltmp1-foo
    .cfi_endproc
.Leh_func_end0:

2) Without aliasing 2)没有别名

foo:                                    # @foo
    .cfi_startproc
# BB#0:
    testl   %esi, %esi
    jle .LBB0_3
# BB#1:                                 # %.lr.ph
    addl    %edx, %edx
    .align  16, 0x90
.LBB0_2:                                # =>This Inner Loop Header: Depth=1
    movl    %edx, (%rdi)
    addq    $4, %rdi
    decl    %esi
    jne .LBB0_2
.LBB0_3:                                # %._crit_edge
    ret
    .size   foo, .Ltmp1-foo
    .cfi_endproc
.Leh_func_end0:

You can see that the version with aliasing has to do more work in the loop body (between the labels LBB0_2 and LBB0_3 ). 您可以看到具有别名的版本必须在循环体(标签LBB0_2LBB0_3 )中执行更多工作。

The type of problem Chandler was talking about can be easily illustrated with a simplified strcpy : Chandler谈论的问题类型可以通过简化的strcpy轻松说明:

char *stpcpy (char * dest, const char * src);

When writing an implementation of this, you might assume that the memory pointed to by dest is completely separate from the memory pointed to by src . 在编写此实现时,您可能会认为dest指向的内存与src指向的内存完全分开。 The compiler) might want to optimize it by reading a block of characters from the string pointed to by src , and writing all of them at once into dest . 编译器)可能希望通过从src指向的字符串中读取一个字符块来优化它,并将所有字符一次写入dest But if dest pointed to one byte ahead of src , the behaviour of this would differ from a simple character-by-character copy. 但是如果dest指向src之前的一个字节,则其行为将与简单的逐字符副本不同。

Here the aliasing problem is that src can alias dest , and the generated code must be made less efficient than it could be if src wasn't allowed to alias dest . 这里的别名问题是src可以对dest进行别名,并且生成的代码必须比没有src不允许别名dest效率低。

The real strcpy uses an extra keyword, Restrict (which is technically only part of C, not C++ , that tells the compiler to assume that src and dest do not overlap, and this allows the compiler to generate much more efficient code. 真正的strcpy使用额外的关键字Restrict技术上只是C的一部分,而不是C ++ ,它告诉编译器假设srcdest不重叠,这允许编译器生成更高效的代码。


Here's an even simpler example where we can see a big difference in the assembly: 这是一个更简单的例子,我们可以看到装配中的一个很大的不同:

void my_function_1(int* a, int* b, int* c) {
    if (*a) *b = *a;
    if (*a) *c = *a;
}

void my_function_2(int* __restrict a, int* __restrict b, int* __restrict c) {
    if (*a) *b = *a;
    if (*a) *c = *a;
}

Assume that this is a simplification of a function where it actually made sense to use two if-statements rather than just if (*a) { *b=*a; *c=*a; } 假设这是函数的简化,其中使用两个if语句实际上是有意义的,而不仅仅是if (*a) { *b=*a; *c=*a; } if (*a) { *b=*a; *c=*a; } if (*a) { *b=*a; *c=*a; } , but the intent is the same. if (*a) { *b=*a; *c=*a; } ,但目的是一样的。

We may assume when writing this that a != b because there's some reason why it would make no sense for my_function be used like that. 在写这个时我们可能会假设a != b因为有一些原因使my_function无意义。 But the compiler can't assume that, and does a store of b and a re-load of a from memory before executing the second line, to cover the case where b == a : 但是,编译器不能假设,并执行的存储b和一个重新加载a来自存储器执行所述第二线,其中,以覆盖壳体之前b == a

0000000000400550 <my_function_1>:
  400550:       8b 07                   mov    (%rdi),%eax
  400552:       85 c0                   test   %eax,%eax                 <= if (*a)
  400554:       74 0a                   je     400560 <my_function_1+0x10>
  400556:       89 06                   mov    %eax,(%rsi)
  400558:       8b 07                   mov    (%rdi),%eax
  40055a:       85 c0                   test   %eax,%eax                 <= if (*a)
  40055c:       74 02                   je     400560 <my_function_1+0x10>
  40055e:       89 02                   mov    %eax,(%rdx)
  400560:       f3 c3                   repz retq

If we remove potential for aliasing by adding __restrict , the compiler generates shorter and faster code: 如果我们通过添加__restrict来消除别名的可能性,编译器会生成更短更快的代码:

0000000000400570 <my_function_2>:
  400570:       8b 07                   mov    (%rdi),%eax
  400572:       85 c0                   test   %eax,%eax
  400574:       74 04                   je     40057a <_Z9my_function_2PiS_S_+0xa>
  400576:       89 06                   mov    %eax,(%rsi)
  400578:       89 02                   mov    %eax,(%rdx)
  40057a:       f3 c3                   repz retq

Consider the following function: 考虑以下功能:

void f(float* lhs, float* rhs, float* out, int size) {
    for(int i = 0; i < size; i++) {
        out[i] = *lhs + *rhs;
    }
}

What's the fastest version of this function? 这个功能的最快版本是什么? Probably, you hoist *lhs + *rhs out of the loop. 也许,你将*lhs + *rhs提升出循环。 The problem is what happens when the pointers alias. 问题是当指针别名时会发生什么。 Imagine what that optimization does if I call it like this: 想象一下,如果我这样调用它,优化会做什么:

float arr[6] = { ... };
f(arr, arr + 1, arr, 6);

As you can see, the problem is that *lhs + *rhs cannot be hoisted out of the loop, because out[i] modifies their values. 正如您所看到的,问题是*lhs + *rhs无法从循环中提升,因为out[i]会修改它们的值。 In fact, the compiler can't hoist any logic out of the loop. 实际上,编译器无法将任何逻辑提升出循环。 So the compiler cannot perform the "obvious" optimization, because if the parameters alias the logic is now incorrect. 因此编译器无法执行“明显的”优化,因为如果参数别名逻辑现在不正确。 However, if the floats are taken by value, then the compiler knows they can't alias and can perform the hoist. 但是,如果浮点数是按值获取的,则编译器知道它们不能使用别名并且可以执行提升。

Of course, this function is pretty silly, but it demonstrates the point. 当然,这个功能非常愚蠢,但它证明了这一点。

a pointer is a value that represents a memory address sometimes 2 pointers can represent the same memory address thats what aliasing is 指针是表示内存地址的值,有时2个指针可以表示与别名相同的内存地址

int * p;
*p = 5;

int * alias;
alias = p;

the variable alias is an alias of p and *alias is equal to 5 if you change *alias then *p changes along with it 变量alias是p的别名,如果更改*alias*alias等于5,那么*p随之变化

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM