简体   繁体   中英

Compile-time with mixed const/non-const ternary operator

Consider the following example :

template<int X> class MyClass
{
    public:
        MyClass(int x) {_ncx = x;}
        void test() 
        {
            for (unsigned int i = 0; i < 1000000; ++i) {
                if ((X < 0) ? (_cx > 5) : (_ncx > 5)) {
                    /* SOMETHING */
                } else {
                    /* SOMETHING */
                }
            }
        }
    protected:
        static const int _cx = (X < 0) ? (-X) : (X);
        int _ncx;
};

My question is : will MyClass<-6>::test() and MyClass<6>::test() have a different speed ?

I hope so because in case of a negative template parameter, the if in test function can be evaluated at compile-time, but I'm not sure what is the behaviour of a compiler if there is a compile-time thing and a non-compile-time thing in a ternary operator (which is the case here).

Note : it's a pure "theoretical" question. If there is a non-null probability of "yes", I will implement some class for my code with such compile-time template parameters, and if not, I will only provide runtime versions.

Move the conditional outside the loop:

        ...
        if ((X < 0) ? (_cx > 5) : (_ncx > 5)) {
            for (unsigned int i = 0; i < 1000000; ++i) {
                /* SOMETHING */
            }
        } else {
            for (unsigned int i = 0; i < 1000000; ++i) {
                /* SOMETHING */
            }
        }
        ...

That way you don't depend on the compiler optimization to remove unused code; if the unused part of the conditional is not removed by the compiler you just pay for a conditional branch once, and not every time around the loop.

For my compiler ( clang++ v2.9 on OS X ) compiling this similar but not identical code:

void foo();
void bar();

template<int N>
void do_something( int arg ) {
  if ( N<0 && arg<0 ) { foo(); }
  else { bar(); }
}

// Some functions to instantiate the templates.
void one_fn(int arg) {
  do_something<1>(arg);
}

void neg_one_fn(int arg) {
  do_something<-1>(arg);
}

This generates the following assembly with clang++ -S -O3 .

one_fn = do_something<1>

The first functions assembly clearly only has the call to bar .

    .globl  __Z6one_fni
    .align  4, 0x90
__Z6one_fni:                            ## @_Z6one_fni
Leh_func_begin0:
    pushl   %ebp
    movl    %esp, %ebp
    popl    %ebp
    jmp __Z3barv                ## TAILCALL
Leh_func_end0:

neg_one_fn = do_something<-1>

The second function has been reduced to a simple if to call either bar or foo .

    .globl  __Z10neg_one_fni
    .align  4, 0x90
__Z10neg_one_fni:                       ## @_Z10neg_one_fni
Leh_func_begin1:
    pushl   %ebp
    movl    %esp, %ebp
    cmpl    $0, 8(%ebp)
    jns LBB1_2                  ## %if.else.i
    popl    %ebp
    jmp __Z3foov                ## TAILCALL
LBB1_2:                                 ## %if.else.i
    popl    %ebp
    jmp __Z3barv                ## TAILCALL
Leh_func_end1:

Summary

So you can see that the compiler inlined the template, then optimised away the branch when it could. So the kind of transformation you are hoping for does occur in current compilers. I got similar results (but less clear assembly) from an old g++ 4.0.1 compiler too.

Addendum:

I decided this example wasn't quite similar enough to your initial case (as it didnt' involve the ternary operator) so I changed it to this: (Getting the same kind of results)

template<int X>
void do_something_else( int _ncx ) {
  static const int _cx = (X<0) ? (-X) : (X);
  if ( (X < 0) ? (_cx > 5) : (_ncx > 5)) {
    foo();
  } else {
    bar();
  }
}

void a(int arg) {
  do_something_else<1>(arg);
}

void b(int arg) {
  do_something_else<-1>(arg);
}

This generates the assembly

a() = do_something_else<1>

This still contains the branch.

__Z1ai:                                 ## @_Z1ai
Leh_func_begin2:
    pushl   %ebp
    movl    %esp, %ebp
    cmpl    $6, 8(%ebp)
    jl  LBB2_2                  ## %if.then.i
    popl    %ebp
    jmp __Z3foov                ## TAILCALL
LBB2_2:                                 ## %if.else.i
    popl    %ebp
    jmp __Z3barv                ## TAILCALL
Leh_func_end2:

b() = do_something_else<-1>

Branch is optimised away.

__Z1bi:                                 ## @_Z1bi
Leh_func_begin3:
    pushl   %ebp
    movl    %esp, %ebp
    popl    %ebp
    jmp __Z3barv                ## TAILCALL
Leh_func_end3:

It probably depends on how smart your compiler is. I recommend you write a little benchmark program to test it out yourself in your environment to find out for sure.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM