简体   繁体   中英

Would std::count_if be faster without an if?

Here's the gcc std::count_if code

template<typename _InputIterator, typename _Predicate>
  typename iterator_traits<_InputIterator>::difference_type
  count_if(_InputIterator __first, _InputIterator __last, _Predicate __pred)
 {
  [snip]
  typename iterator_traits<_InputIterator>::difference_type __n = 0;
  for (; __first != __last; ++__first)
    if (__pred(*__first))
      ++__n;
  return __n;
}

My question: would it work better (ie, faster) to use

__n += __pred(*__first); // instead of the if statement

This version always does an add, but doesn't do a branch.

The replacement you gave is not equivalent, because there are far fewer restrictions on a predicate than you think:

  • Anything which can be used in a conditional context (can be contextually converted to bool ), is a valid return-type for the predicate (an explicit conversion to bool is enough).
  • That return-type can react funny to being added to the iterators difference-type.

25 Algorithms library [algorithms]

25.1 General [algorithms.general]

8 The Predicate parameter is used whenever an algorithm expects a function object (20.9) that, when applied to the result of dereferencing the corresponding iterator, returns a value testable as true . In other words, if an algorithm takes Predicate pred as its argument and first as its iterator argument, it should work correctly in the construct pred(*first) contextually converted to bool (Clause 4) . The function object pred shall not apply any non-constant function through the dereferenced iterator.

The most likely return giving your replacement indigestion would be a standard integer-type, and a value neither 0 nor 1.

Also, keep in mind that compilers can actually optimize really good nowadays (and especially C++ ones need to, with all that template-stuff layered deep).

So, first, your suggested code is different. So let's look at two equivalent codes:

template<typename _InputIterator, typename _Predicate>
typename iterator_traits<_InputIterator>::difference_type
count_if(_InputIterator __first, _InputIterator __last, _Predicate __pred) {
    typename iterator_traits<_InputIterator>::difference_type __n = 0;
    for (; __first != __last; ++__first)
        if (__pred(*__first))
            ++__n;
    return __n;
}

And:

template<typename _InputIterator, typename _Predicate>
typename iterator_traits<_InputIterator>::difference_type
count_if(_InputIterator __first, _InputIterator __last, _Predicate __pred) {
    typename iterator_traits<_InputIterator>::difference_type __n = 0;
    for (; __first != __last; ++__first)
        __n += (bool) __pred(*__first);
    return __n;
}

Then, we can compile this with our compiler and look at the assembly. Under one compiler that I tried (clang on os x), these produced identical code .

Perhaps your compiler will also produce identical code, or perhaps it might produce different code.

Technically it would, but keep in mind that all values great than 0 evaluate to true . So the called function might return a value other than 1 , which would skew the result. Also, the compiler has means to optimize the branch away into a conditional move.

To expand, there are certainly ways to optimize the branch away in code, but this reduces readability and maintainability as well as the ability to debug the code by eg. placing breakpoints down, and gaining very little since compilers are pretty damn good at optimzing these things on their own.

The code generated by the compiler does not necessarily literally reproduce C++ language constructs in machine code. Just because your C++ code has an if statement in it does not mean that machine code will be based on a branching instruction. Modern compilers are not required to and do not literally implement the behavior of the abstract C++ machine in the generated machine code.

For this reason it is impossible to say whether it will be faster or not. C++ code does not have any inherent "speed" associated with it. C++ code is never executed directly. It can't be "faster" or "slower" from the abstract point of view. If you want to analyze the performance of the code by looking at it, you have to look at the machine code generated by your compiler, not at C++ code. But an even better method would be to try both variants and profile them by actually running them on various kinds of typical input data.

It is quite possible that a smart compiler will generate identical code for both of your variants.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM