简体   繁体   English

编译器谓词优化

[英]Compiler predicate optimizations

Consider the following example conditions/predicates: 请考虑以下示例条件/谓词:

  1. x > 10 and x > 20
  2. (x > 10 or x == 10) and (x < 10 or x == 10) aka x >= 10 and x <= 10 (x > 10 or x == 10) and (x < 10 or x == 10)又名x >= 10 and x <= 10

Predicate 1. can be simplified to x > 20 and 2. can be simplified to x == 10 . 谓词1.可以简化为x > 20和2.可以简化为x == 10 Would a compiler optimize this kind of (or more complex) predicates and if so what algorithms are used to do so? 编译器是否会优化这种(或更复杂的)谓词,如果是,那么使用什么算法呢?

What are some common optimization techniques for predicates? 什么是谓词的常见优化技术?

It depends on the compiler, but clang and gcc do perform this optimisation: 这取决于编译器,但clang和gcc会执行此优化:

#include <stdio.h>

void foo(int x) {
  if (x > 10 && x > 20)
    puts("foo");
}

void foo2(int x) {
  if ((x > 10 || x == 10) && (x < 10 || x == 10))
    puts("foo2");
}

You can see the assembly here -- both functions contain a single comparison. 您可以在此处查看程序集 - 两个函数都包含单个比较。

For clang (which uses LLVM), it uses the instruction combine pass ('instcombine'). 对于clang(使用LLVM),它使用指令combine pass ('instcombine')。 You can see of the transformations in the InstructionSimplify.cpp source code. 您可以在InstructionSimplify.cpp源代码中看到转换。

Looking at the IL code that the C# compiler spits out for the following method, at least in this case the compiler does not seem smart enough. 查看C#编译器为以下方法吐出的IL代码,至少在这种情况下编译器看起来不够智能。 Not sure, though, what happens when the IL code gets translated into native code or even later in the processor pipeline - there will be further optimizations kicking in: 但是,不确定当IL代码被转换为本机代码甚至更晚的处理器管道时会发生什么 - 将会有进一步的优化:

private static bool Compare(int x)
{
   return (x > 10 || x == 10) && (x < 10 || x == 10);
}

Corresponding IL: 相应的IL:

IL_0000: ldarg.0      // x
IL_0001: ldc.i4.s     10 // 0x0a
IL_0003: bgt.s        IL_000a
IL_0005: ldarg.0      // x
IL_0006: ldc.i4.s     10 // 0x0a
IL_0008: bne.un.s     IL_0017
IL_000a: ldarg.0      // x
IL_000b: ldc.i4.s     10 // 0x0a
IL_000d: blt.s        IL_0015
IL_000f: ldarg.0      // x
IL_0010: ldc.i4.s     10 // 0x0a
IL_0012: ceq          
IL_0014: ret          
IL_0015: ldc.i4.1     
IL_0016: ret          
IL_0017: ldc.i4.0     
IL_0018: ret

Here's the second (optimized) version: 这是第二个(优化)版本:

private static bool Compare(int x)
{
   return x >= 10 && x <= 10;
}

And, again, the corresponding IL code: 而且,相应的IL代码:

IL_0000: ldarg.0      // x
IL_0001: ldc.i4.s     10 // 0x0a
IL_0003: blt.s        IL_000e
IL_0005: ldarg.0      // x
IL_0006: ldc.i4.s     10 // 0x0a
IL_0008: cgt          
IL_000a: ldc.i4.0     
IL_000b: ceq          
IL_000d: ret          
IL_000e: ldc.i4.0     
IL_000f: ret          

Since the second version is clearly shorter it has greater chances of getting inlined at runtime so we should expect it to run a bit faster. 由于第二个版本明显更短,因此它有更大的机会在运行时进行内联,因此我们应该期望它运行得更快。

Finally, the third one, let's call it "the best" ( x == 10 ): 最后,第三个,我们称之为“最好的”( x == 10 ):

private static bool Compare(int x)
{
    return x == 10;
}

And its IL: 而它的IL:

IL_0000: ldarg.0      // x
IL_0001: ldc.i4.s     10 // 0x0a
IL_0003: ceq          
IL_0005: ret          

Nice and concise. 简洁明了。

Running a benchmark using Benchmark.NET and [MethodImpl(MethodImplOptions.NoInlining)] reveals the runtime behaviour which seems still substantially different for the two implementations: 使用Benchmark.NET和[MethodImpl(MethodImplOptions.NoInlining)]运行基准测试会显示运行时行为,这两种实现看起来仍然大不相同:

Case 1: test candidates that are not 10 (negative case). 案例1:测试不是10的候选人(否定案例)。

     Method |       Jit | Platform |     Mean 
----------- |---------- |--------- |----------
   TestBest | LegacyJit |      X64 | 2.329 ms
    TestOpt | LegacyJit |      X64 | 2.704 ms
 TestNonOpt | LegacyJit |      X64 | 3.324 ms
   TestBest | LegacyJit |      X86 | 1.956 ms
    TestOpt | LegacyJit |      X86 | 2.178 ms
 TestNonOpt | LegacyJit |      X86 | 2.796 ms
   TestBest |    RyuJit |      X64 | 2.480 ms
    TestOpt |    RyuJit |      X64 | 2.489 ms
 TestNonOpt |    RyuJit |      X64 | 3.101 ms
   TestBest |    RyuJit |      X86 | 1.865 ms
    TestOpt |    RyuJit |      X86 | 2.170 ms
 TestNonOpt |    RyuJit |      X86 | 2.853 ms

Case 2: test using 10 (positive case). 案例2:使用10(正面案例)进行测试。

     Method |       Jit | Platform |     Mean
----------- |---------- |--------- |---------
   TestBest | LegacyJit |      X64 | 2.396 ms
    TestOpt | LegacyJit |      X64 | 2.780 ms
 TestNonOpt | LegacyJit |      X64 | 3.370 ms
   TestBest | LegacyJit |      X86 | 2.044 ms
    TestOpt | LegacyJit |      X86 | 2.199 ms
 TestNonOpt | LegacyJit |      X86 | 2.533 ms
   TestBest |    RyuJit |      X64 | 2.470 ms
    TestOpt |    RyuJit |      X64 | 2.532 ms
 TestNonOpt |    RyuJit |      X64 | 2.552 ms
   TestBest |    RyuJit |      X86 | 1.911 ms
    TestOpt |    RyuJit |      X86 | 2.210 ms
 TestNonOpt |    RyuJit |      X86 | 2.753 ms

Interesting to see is that in both cases, the new JIT runs in about the same time for the opt and non-opt X64 version. 有趣的是,在这两种情况下,新的JIT几乎同时运行opt和非opt X64版本。

The question still is: Why does the compiler not optimize these kinds of patterns? 问题仍然是:为什么编译器不优化这些模式? My guess would be that it's because of stuff like operator overloading which makes it impossible for the compiler to infer some correct logical conclusions but II might be extremely off... Also, for the built-in value types it should be possible. 我的猜测是因为运算符重载之类的东西使得编译器不可能推断出一些正确的逻辑结论,但是II可能非常关闭...而且,对于内置值类型,它应该是可能的。 Oh well... 那好吧...

Lastly, here's a good articel on optimizations for boolean expressions: https://hbfs.wordpress.com/2008/08/26/optimizing-boolean-expressions-for-speed/ 最后,这里有一个关于布尔表达式优化的好文章: https ://hbfs.wordpress.com/2008/08/26/optimizing-boolean-expressions-for-speed/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 指针会抑制编译器优化吗? - Do pointers inhibit compiler optimizations? 由于编译器优化,代码工作缓慢 - A code works slowly because of compiler optimizations 要求编译器内联我的类并应用通常的优化 - Asking compiler to inline my class and apply usual optimizations LINQ在编译器级别执行哪种类型的优化? - What type of optimizations does LINQ perform at the compiler level? 功能编程是否允许更好的运行时编译器优化? - Does Functional programming allow better runtime compiler optimizations? 在VB.NET中创建一个NotInheritable类是否提供了与C#中密封相同的(潜在的)编译器优化? - Does making a class NotInheritable in VB.NET offer the same (potential) compiler optimizations as sealed in C#? C99:访问全局变量和别名 memory 指针时的编译器优化 - C99: compiler optimizations when accessing global variables and aliased memory pointers 编译器优化允许通过“int”,“least”和“fast”非固定宽度类型C / C ++ - Compiler optimizations allowed via “int”, “least” and “fast” non-fixed width types C/C++ 关闭编译器优化? 我用于评估算法顺序的 C# 代码为简单循环返回 logN 或 N^3 而不是 N - Turning off Compiler Optimizations? My C# code to evaluate order of algorithm is returning logN or N^3 instead of N for simple loop Appengine优化 - Appengine Optimizations
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM