简体   繁体   English

在模数之前和分配操作之前,`if`语句是否冗余?

[英]Is the `if` statement redundant before modulo and before assign operations?

Consider next code: 考虑下一个代码:

unsigned idx;
//.. some work with idx
if( idx >= idx_max )
    idx %= idx_max;

Could be simplified to only second line: 可以简化为仅第二行:

idx %= idx_max;

and will achieve the same result. 并将取得同样的结果。


Several times I met next code: 好几次我遇到了下一个代码:

unsigned x;
//... some work with x
if( x!=0 )
  x=0;

Could be simplified to 可以简化为

x=0;

The questions: 问题:

  • Is there any sense to use if and why? 有没有意义使用if和为什么? Especially with ARM Thumb instruction set. 特别是使用ARM Thumb指令集。
  • Could these if s be omited? 难道这些if s内被遗漏?
  • What optimization does compiler? 编译器有什么优化?

If you want to understand what the compiler is doing, you'll need to just pull up some assembly. 如果你想了解编译器正在做什么,你需要提取一些程序集。 I recommend this site (I already entered code from the question)): https://godbolt.org/g/FwZZOb . 我推荐这个网站(我已经从问题中输入了代码)): https//godbolt.org/g/FwZZOb

The first example is more interesting. 第一个例子更有趣。

int div(unsigned int num, unsigned int num2) {
    if( num >= num2 ) return num % num2;
    return num;
}

int div2(unsigned int num, unsigned int num2) {
    return num % num2;
}

Generates: 产生:

div(unsigned int, unsigned int):          # @div(unsigned int, unsigned int)
        mov     eax, edi
        cmp     eax, esi
        jb      .LBB0_2
        xor     edx, edx
        div     esi
        mov     eax, edx
.LBB0_2:
        ret

div2(unsigned int, unsigned int):         # @div2(unsigned int, unsigned int)
        xor     edx, edx
        mov     eax, edi
        div     esi
        mov     eax, edx
        ret

Basically, the compiler will not optimize away the branch, for very specific and logical reasons. 基本上,出于非常具体和逻辑的原因,编译器不会优化分支。 If integer division was about the same cost as comparison, then the branch would be pretty pointless. 如果整数除法与比较的成本大致相同,那么分支将是毫无意义的。 But integer division (which modulus is performed together with typically) is actually very expensive: http://www.agner.org/optimize/instruction_tables.pdf . 但整数除法(模数与典型值一起执行)实际上非常昂贵: http//www.agner.org/optimize/instruction_tables.pdf The numbers vary greatly by architecture and integer size but it typically could be a latency of anywhere from 15 to close to 100 cycles. 这些数字因架构和整数大小而异,但通常可能是从15到接近100个周期的延迟。

By taking a branch before performing the modulus, you can actually save yourself a lot of work. 通过在执行模数之前选择分支,您实际上可以节省大量的工作。 Notice though: the compiler also does not transform the code without a branch into a branch at the assembly level. 请注意:编译器也不会将没有分支的代码转换为程序集级别的分支。 That's because the branch has a downside too: if the modulus ends up being necessary anyway, you just wasted a bit of time. 那是因为分支也有一个缺点:如果最终需要模数,那你就浪费了一点时间。

There's no way to make a reasonable determination about the correct optimization without knowing the relative frequency with which idx < idx_max will be true. 在不知道idx < idx_max将为真的相对频率的情况下,无法对正确的优化做出合理的确定。 So the compilers (gcc and clang do the same thing) opt to map the code in a relatively transparent way, leaving this choice in the hands of the developer. 所以编译器(gcc和clang做同样的事情)选择以相对透明的方式映射代码,将这个选择留给开发人员。

So that branch might have been a very reasonable choice. 所以这个分支可能是一个非常合理的选择。

The second branch should be completely pointless, because comparison and assignment are of comparable cost. 第二分支应该是完全没有意义的,因为比较和分配可比的成本。 That said, you can see in the link that compilers will still not perform this optimization if they have a reference to the variable. 也就是说,您可以在链接中看到,如果编译器具有对变量的引用,则仍然不会执行此优化。 If the value is a local variable (as in your demonstrated code) then the compiler will optimize the branch away. 如果值是局部变量(如在演示的代码中那样),则编译器将优化分支。

In sum the first piece of code is perhaps a reasonable optimization, the second, probably just a tired programmer. 总之,第一段代码可能是一个合理的优化,第二段,可能只是一个累了的程序员。

There are a number of situations where writing a variable with a value it already holds may be slower than reading it, finding out already holds the desired value, and skipping the write. 在许多情况下,使用已经存在的值写入变量可能比读取它更慢,找出已经保持所需的值,并跳过写入。 Some systems have a processor cache which sends all write requests to memory immediately. 某些系统具有处理器缓存,可立即将所有写入请求发送到内存。 While such designs aren't commonplace today, they used to be quite common since they can offer a substantial fraction of the performance boost that full read/write caching can offer, but at a small fraction of the cost. 虽然这种设计在今天并不常见,但它们过去常常很常见,因为它们可以提供完整读/写缓存所能提供的大部分性能提升,但成本只是其中的一小部分。

Code like the above can also be relevant in some multi-CPU situations. 像上面这样的代码也可以在某些多CPU情况下相关。 The most common such situation would be when code running simultaneously on two or more CPU cores will be repeatedly hitting the variable. 最常见的情况是在两个或多个CPU核心上同时运行的代码将重复命中变量。 In a multi-core caching system with a strong memory model, a core that wants to write a variable must first negotiate with other cores to acquire exclusive ownership of the cache line containing it, and must then negotiate again to relinquish such control the next time any other core wants to read or write it. 在具有强大内存模型的多核缓存系统中,想要编写变量的核心必须首先与其他核心协商以获取包含它的缓存行的独占所有权,然后必须再次协商以在下次放弃此类控制时任何其他核心都想读或写它。 Such operations are apt to be very expensive, and the costs will have to be borne even if every write is simply storing the value the storage already held. 这样的操作往往非常昂贵,即使每次写入只是存储已经存储的值,也必须承担成本。 If the location becomes zero and is never written again, however, both cores can hold the cache line simultaneously for non-exclusive read-only access and never have to negotiate further for it. 但是,如果位置变为零并且永远不会再次写入,则两个内核可以同时保留高速缓存行以进行非独占只读访问,并且永远不必进一步协商它。

In almost all situations where multiple CPUs could be hitting a variable, the variable should at minimum be declared volatile . 在几乎所有多个CPU都可以命中变量的情况下,变量应该至少被声明为volatile The one exception, which might be applicable here, would be in cases where all writes to a variable that occur after the start of main() will store the same value, and code would behave correctly whether or not any store by one CPU was visible in another. 可能适用的一个例外是,在main()启动后发生的对变量的所有写入都将存储相同的值,并且无论一个CPU的任何存储是否可见,代码都将正常运行在另一个。 If doing some operation multiple times would be wasteful but otherwise harmless, and the purpose of the variable is to say whether it needs to be done, then many implementations may be able to generate better code without the volatile qualifier than with, provided that they don't try to improve efficiency by making the write unconditional. 如果多次执行某些操作会浪费但是无害,并且变量的目的是说是否需要完成,那么许多实现可能能够生成更好的代码而没有使用volatile限定符,只要它们没有尝试通过使写入无条件来提高效率。

Incidentally, if the object were accessed via pointer, there would be another possible reason for the above code: if a function is designed to accept either a const object where a certain field is zero, or a non- const object which should have that field set to zero, code like the above might be necessary to ensure defined behavior in both cases. 顺便说一下,如果对象被经由指针访问,就上面的代码另一种可能的原因是:如果一个函数被设计成接受一个const对象,其中某一个领域是零,或一个非const ,其应该有场地对象设置为零,可能需要像上面这样的代码来确保两种情况下定义的行为。

Regards first block of code: this is a micro-optimization based on Chandler Carruth's recommendations for Clang (see here for more info), however it doesn't necessarily hold that it would be a valid micro-optimization in this form (using if rather than ternary) or on any given compiler. 关注第一块代码:这是基于Chandler Carruth对Clang的推荐的微优化(参见此处获取更多信息),但它并不一定认为它是这种形式的有效微优化(使用if if)比三元)或任何给定的编译器。

Modulo is a reasonably expensive operation, if the code is being executed often and there is a strong statistical lean to one side or the other of the conditional, the CPU's branch prediction (given a modern CPU) will significantly reduce the cost of the branch instruction. Modulo是一个相当昂贵的操作,如果代码经常执行并且有一个强大的统计倾向于条件的一侧或另一侧,CPU的分支预测(给定一个现代CPU)将显着降低分支指令的成本。

It seems a bad idea to use the if there, to me. 对我来说使用if if似乎是一个坏主意。

You are right. 你是对的。 Whether or not idx >= idx_max , it will be under idx_max after idx %= idx_max . 无论idx >= idx_max ,它都将在idx %= idx_max之后的idx %= idx_max If idx < idx_max , it will be unchanged, whether the if is followed or not. 如果idx < idx_max ,它将保持不变,是否遵循if。

While you might think branching around the modulo might save time, real culprit, I'd say, is that when branches are followed, pipelining modern CPU's have to reset their pipeline, and that costs a relative lot of time. 虽然您可能认为围绕模数进行分支可能会节省时间,但我认为,真正的罪魁祸首是,当遵循分支时,流水线化现代CPU必须重置其管道,并且这需要花费相对大量的时间。 Better not to have to follow a branch, than do an integer modulo, which costs roughly as much time as an integer division. 最好不要跟随分支,而不是整数模,这大约和整数除法一样多。

EDIT: It turns out that the modulus is pretty slow vs. the branch, as suggested by others here. 编辑:事实证明,模数对分支的速度相当慢,正如其他人所建议的那样。 Here's a guy examining this exact same question: CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!" 这是一个研究这个问题的人: CppCon 2015:Chandler Carruth“调优C ++:基准测试,CPU和编译器!哦,我的!” (suggested in another SO question linked to in another answer to this question). (在另一个SO问题中建议与此问题的另一个答案相关联)。

This guy writes compilers, and thought it would be faster without the branch; 这个人写编译器,并认为没有分支会更快; but his benchmarks proved him wrong. 但他的基准证明他错了。 Even when the branch was taken only 20% of the time, it tested faster. 即使分支机构仅占20%的时间,它的测试速度也更快。

Another reason not to have the if: One less line of code to maintain, and for someone else to puzzle out what it means. 没有if的另一个原因是:维护一行代码,让别人弄清楚它意味着什么。 The guy in the above link actually created a "faster modulus" macro. 上面链接中的人实际上创建了一个“更快模数”的宏。 IMHO, this or an inline function is the way to go for performance-critical applications, because your code will be ever so much more understandable without the branch, but will execute as fast. 恕我直言,这个或内联函数是性能关键型应用程序的方法,因为没有分支,你的代码将变得更容易理解,但执行速度会快。

Finally, the guy in the above video is planning to make this optimization known to compiler writers. 最后,上面视频中的人正计划让编译器编写者知道这种优化。 Thus, the if will probably be added for you, if not in the code. 因此,如果不在代码中,可能会为您添加if。 Hence, just the mod alone will do, when this comes about. 因此,当出现这种情况时,仅仅是mod就能做到。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM