[英]Would std::count_if be faster without an if?
Here's the gcc std::count_if
code 这是gcc
std::count_if
代码
template<typename _InputIterator, typename _Predicate>
typename iterator_traits<_InputIterator>::difference_type
count_if(_InputIterator __first, _InputIterator __last, _Predicate __pred)
{
[snip]
typename iterator_traits<_InputIterator>::difference_type __n = 0;
for (; __first != __last; ++__first)
if (__pred(*__first))
++__n;
return __n;
}
My question: would it work better (ie, faster) to use 我的问题:使用它会更好(即更快)
__n += __pred(*__first); // instead of the if statement
This version always does an add, but doesn't do a branch. 此版本始终执行添加,但不执行分支。
The replacement you gave is not equivalent, because there are far fewer restrictions on a predicate than you think: 您提供的替换不等同,因为对谓词的限制远远少于您的想法:
bool
), is a valid return-type for the predicate (an explicit
conversion to bool
is enough). bool
)是谓词的有效返回类型( explicit
转换为bool
就足够了)。 25 Algorithms library
[algorithms]
25算法库
[algorithms]
25.1 General
[algorithms.general]
25.1一般
[algorithms.general]
8 The
Predicate
parameter is used whenever an algorithm expects a function object (20.9) that, when applied to the result of dereferencing the corresponding iterator, returns a value testable astrue
.8
Predicate
参数用于每当算法期望的功能对象(20.9),当施加到解引用相应的迭代的结果,返回可测试作为值true
。 In other words, if an algorithm takesPredicate pred
as its argument andfirst
as its iterator argument, it should work correctly in the constructpred(*first)
contextually converted tobool
(Clause 4) .换句话说,如果算法将
Predicate pred
作为其参数并first
作为其迭代器参数,则它应该在构造pred(*first)
上下文中正确转换为bool
(第4条) 。 The function objectpred
shall not apply any non-constant function through the dereferenced iterator.函数对象
pred
不应通过解引用的迭代器应用任何非常量函数。
The most likely return giving your replacement indigestion would be a standard integer-type, and a value neither 0 nor 1. 给予替代消化不良的最可能的回报是标准整数类型,并且值既不是0也不是1。
Also, keep in mind that compilers can actually optimize really good nowadays (and especially C++ ones need to, with all that template-stuff layered deep). 另外,请记住编译器现在实际上可以真正优化(特别是C ++需要,所有模板 - 东西分层深)。
So, first, your suggested code is different. 所以,首先,您建议的代码是不同的。 So let's look at two equivalent codes:
那么让我们看看两个等价的代码:
template<typename _InputIterator, typename _Predicate>
typename iterator_traits<_InputIterator>::difference_type
count_if(_InputIterator __first, _InputIterator __last, _Predicate __pred) {
typename iterator_traits<_InputIterator>::difference_type __n = 0;
for (; __first != __last; ++__first)
if (__pred(*__first))
++__n;
return __n;
}
And: 和:
template<typename _InputIterator, typename _Predicate>
typename iterator_traits<_InputIterator>::difference_type
count_if(_InputIterator __first, _InputIterator __last, _Predicate __pred) {
typename iterator_traits<_InputIterator>::difference_type __n = 0;
for (; __first != __last; ++__first)
__n += (bool) __pred(*__first);
return __n;
}
Then, we can compile this with our compiler and look at the assembly. 然后,我们可以使用编译器编译它并查看程序集。 Under one compiler that I tried (clang on os x), these produced identical code .
在我尝试过的一个编译器(os x on clax)下,这些编译器产生了相同的代码 。
Perhaps your compiler will also produce identical code, or perhaps it might produce different code. 也许你的编译器也会产生相同的代码,或者它可能产生不同的代码。
Technically it would, but keep in mind that all values great than 0
evaluate to true
. 从技术上讲,它会,但请记住,所有大于
0
值都会评估为true
。 So the called function might return a value other than 1
, which would skew the result. 因此被调用的函数可能会返回一个不是
1
的值,这会使结果产生偏差。 Also, the compiler has means to optimize the branch away into a conditional move. 此外,编译器还具有将分支优化为条件移动的方法。
To expand, there are certainly ways to optimize the branch away in code, but this reduces readability and maintainability as well as the ability to debug the code by eg. 为了扩展,有一些方法可以在代码中优化分支,但这会降低可读性和可维护性,以及通过例如调试代码的能力。 placing breakpoints down, and gaining very little since compilers are pretty damn good at optimzing these things on their own.
把断点放下来,并且获得很少,因为编译器非常善于自己优化这些东西。
The code generated by the compiler does not necessarily literally reproduce C++ language constructs in machine code. 由编译器生成的代码不必在字面上机器代码重现C ++语言构造。 Just because your C++ code has an
if
statement in it does not mean that machine code will be based on a branching instruction. 仅仅因为您的C ++代码中包含
if
语句并不意味着机器代码将基于分支指令。 Modern compilers are not required to and do not literally implement the behavior of the abstract C++ machine in the generated machine code. 现代编译器不需要也不要在生成的机器代码中实现抽象C ++机器的行为。
For this reason it is impossible to say whether it will be faster or not. 因此,不可能说它是否会更快。 C++ code does not have any inherent "speed" associated with it.
C ++代码没有任何与之相关的固有“速度”。 C++ code is never executed directly.
C ++代码永远不会直接执行。 It can't be "faster" or "slower" from the abstract point of view.
从抽象的角度来看,它不能“更快”或“更慢”。 If you want to analyze the performance of the code by looking at it, you have to look at the machine code generated by your compiler, not at C++ code.
如果要通过查看代码来分析代码的性能,则必须查看编译器生成的机器代码,而不是C ++代码。 But an even better method would be to try both variants and profile them by actually running them on various kinds of typical input data.
但是更好的方法是尝试两种变体并通过在各种典型输入数据上实际运行它们来对它们进行分析。
It is quite possible that a smart compiler will generate identical code for both of your variants. 智能编译器很可能会为您的两个变体生成相同的代码。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.