简体   繁体   English

C 中的优化

[英]Optimization in C

I've been trying to optimize some simple code and I try two kind of optimizations, loop enrolling and memory aliasing.我一直在尝试优化一些简单的代码,我尝试了两种优化,循环注册和 memory 别名。
My original code:我的原始代码:

int paint(char *dst, unsigned n, char *src, char bias)
{
    unsigned i;
    for (i=0;i<n;i++) {
        *dst++ = bias + *src++;
    }
    return 0;
}

My optimizated code after loop enrolling:循环注册后我的优化代码:

int paint(char *dst, unsigned n, char *src, char bias)
{
    unsigned i;
    for (i=0;i<n;i+=2) {
       *dst++ = bias + *src++;
        *dst++ = bias + *src++;
    }
    return 0;
}

How after this I can optimize the code with memory aliasing?在此之后如何使用 memory 别名优化代码? And there are another good optimizations for this code?这段代码还有其他很好的优化吗? (Like cast the pointers to long pointers to copy quickly) (就像将指针转换为长指针以快速复制)

Optimization in C is easier than this. C 中的优化比这更容易。

cc -Wall -W -pedantic -O3 -march=native -flto source.c

That will unroll any loops that need to be unrolled.这将展开任何需要展开的循环。 Doing your own unrolling, Duff's Device and other tricks are outdated and pretty useless.自行展开、Duff's Device 和其他技巧已经过时且毫无用处。

As for aliasing, your function uses two char* parameters.至于别名,您的 function 使用两个char*参数。 If they are guaranteed to never point into the same arrays then you can use the restrict keyword.如果保证它们永远不会指向同一个 arrays 那么您可以使用restrict关键字。 That will allow the optimizer to assume more things about the code and use vectorized instructions.这将允许优化器假设更多关于代码的事情并使用矢量化指令。

Check out the assembly produced here: https://godbolt.org/z/xMfebr or https://godbolt.org/z/j1xMYz查看此处生成的组件: https://godbolt.org/z/xMfebrhttps://godbolt.org/z/j1xMYz

Can you manage to do all of that by hand?你能设法手工完成所有这些吗? Probably not.可能不是。

Are you only concerned about performance?你只关心性能吗? What about correctness?正确性呢?

Judging by the name of your function paint and the variable bias (and using my crystal ball), I guess you need to add with saturation (in case of overflow).从您的 function paint的名称和可变bias (并使用我的水晶球)来看,我想您需要添加饱和度(以防溢出)。 This can be dune by using intrinsics for paddusb ( https://www.felixcloutier.com/x86/paddusb:paddusw ): https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=774,433,4179,4179&cats=Arithmetic&text=paddusb这可以通过使用paddusb的内在函数( https://www.felixcloutier.com/x86/paddusb:paddusw ): https://software.intel.com/sites4#79and7= 4179&cats=算术&文本=paddusb

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM