[英]in c++, the fastest way to convert a positive number to 1 and negative to 0
I'm writing this code to make a candlestick chart and I want a red box if the open price for the day is greater than the close. 我正在编写此代码以制作烛台图,并且如果当天的开盘价大于收盘价,我想要一个红色框。 I also want the box to be green if the close is higher than the open price.
如果收盘价高于开盘价,我还希望该框为绿色。
if(open > close) {
boxColor = red;
} else {
boxColor = green;
}
Pseudo code is easier than an English sentence for this. 为此,伪代码比英文句子更容易。
So I wrote this code first and then tried to benchmark it but I don't know how to get meaningful results. 因此,我首先编写了这段代码,然后尝试对其进行基准测试,但是我不知道如何获得有意义的结果。
for(int i = 0; i < history.get().close.size(); i++) {
auto open = history->open[i];
auto close = history->close[i];
int red = ((int)close - (int)open) >> ((int)sizeof(close) * 8);
int green = ((int)open - (int)close) >> ((int)sizeof(close) * 8);
gl::color(red,green,0);
gl::drawSolidRect( Rectf(vec2(i - 1, open), vec2(i + 1, close)) );
}
This is how I tried to benchmark it. 这就是我尝试对其进行基准测试的方式。 Each run just shows 2ns.
每次运行仅显示2ns。 My main question to the community is this:
我对社区的主要问题是:
Can I actually make it faster by using a right shift and avoid a conditional branch? 我是否可以通过右移并避免有条件分支来使速度更快?
#include <benchmark/reporter.h>
static void BM_red_noWork(benchmark::State& state) {
double open = (double)rand() / RAND_MAX;
double close = (double)rand() / RAND_MAX;
while (state.KeepRunning()) {
}
}
BENCHMARK(BM_red_noWork);
static void BM_red_fast_work(benchmark::State& state) {
double open = (double)rand() / RAND_MAX;
double close = (double)rand() / RAND_MAX;
while (state.KeepRunning()) {
int red = ((int)open - (int)close) >> sizeof(int) - 1;
}
}
BENCHMARK(BM_red_fast_work);
static void BM_red_slow_work(benchmark::State& state) {
double open = (double)rand() / RAND_MAX;
double close = (double)rand() / RAND_MAX;
while (state.KeepRunning()) {
int red = open > close ? 0 : 1;
}
}
BENCHMARK(BM_red_slow_work);
Thanks! 谢谢!
As I stated in my comment, the compiler will do these optimizations for you. 正如我在评论中所述,编译器将为您进行这些优化。 Here is a minimal compilable example:
这是一个最小的可编译示例:
int main() {
volatile int a = 42;
if (a <= 0) {
return 0;
} else {
return 1;
}
}
The volatile
is simply to prevent optimizations from "knowing" the value of a
and instead it forces it to be read. 该
volatile
仅仅是为了防止从优化“知道”的价值a
,而是它迫使它来读取。
This was compiled with the command g++ -O3 -S test.cpp
and it produces a file named test.s 这是使用命令
g++ -O3 -S test.cpp
编译的,并生成了一个名为test.s的文件。
Inside test.s is the assembly generated by the compiler (pardon AT&T syntax): 在test.s内部是编译器生成的程序集(赦免AT&T语法):
movl $42, -4(%rsp)
movl -4(%rsp), %eax
testl %eax, %eax
setg %al
movzbl %al, %eax
ret
As you can see, it is branchless. 如您所见,它是无分支的。 It uses
testl
to set a flag if the number is <= 0
and then reads that value using setg
, moves it back into the proper register, then finally it returns. 它使用
testl
设置一个标志,如果数量为<= 0
,然后读取,使用值setg
,移回到适当的寄存器中,然后最终返回。
It should be noted, at this was adapted from your code. 应该注意的是,这是从您的代码改编而来的。 A much better way to write this is simply:
更好的写方法是:
int main() {
volatile int a = 42;
return a > 0;
}
It also generates the same assembly. 它还会生成相同的程序集。
This is likely to be better than anything readable you could write directly in C++. 这可能比您可以直接用C ++编写的任何可读性更好。 For instance your code (hopefully corrected for bit arithmetic errors):
例如您的代码(希望已针对位算术错误进行了更正):
int main() {
volatile int a = 42;
return ~(a >> (sizeof(int) * CHAR_BIT - 1)) & 1;
}
Compiles to: 编译为:
movl $42, -4(%rsp)
movl -4(%rsp), %eax
notl %eax
shrl $31, %eax
ret
Which is indeed, very slightly smaller. 确实的确很小。 But it's not significantly faster.
但这并不明显更快。 Especially not when you have a GL call right next to it.
特别是当您旁边有GL呼叫时,尤其如此。 I'd rather spend 1-3 additional cycles to get readable code, rather than have to scratch my head wondering what my coworker (or me from 6 months ago, which is essentially the same thing) did.
我宁愿花1-3个额外的周期来获得可读的代码,而不必费心思索我的同事(或者6个月前的我,基本上是同一回事)做了什么。
EDIT: I should be remarked that the compiler also optimized the bit arithmetic I wrote, because I wrote it less well than I could have. 编辑:我应该指出,编译器还优化了我编写的位算法,因为我编写的比我想象的要差。 The assembly is actually:
(~a) >> 31
which is equivalent to the ~(a >> 31) & 1
that I wrote (at least in most implementations with an unsigned integer, see comments for details). 程序集实际上是:
(~a) >> 31
,它等效于我编写的~(a >> 31) & 1
(至少在大多数使用无符号整数的实现中,有关详细信息,请参见注释)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.