简体   繁体   English

在c ++中,将正数转换为1并将负数转换为0的最快方法

[英]in c++, the fastest way to convert a positive number to 1 and negative to 0

I'm writing this code to make a candlestick chart and I want a red box if the open price for the day is greater than the close. 我正在编写此代码以制作烛台图,并且如果当天的开盘价大于收盘价,我想要一个红色框。 I also want the box to be green if the close is higher than the open price. 如果收盘价高于开盘价,我还希望该框为绿色。

if(open > close) {
    boxColor = red;
} else {
    boxColor = green;
}

Pseudo code is easier than an English sentence for this. 为此,伪代码比英文句子更容易。

So I wrote this code first and then tried to benchmark it but I don't know how to get meaningful results. 因此,我首先编写了这段代码,然后尝试对其进行基准测试,但是我不知道如何获得有意义的结果。

for(int i = 0; i < history.get().close.size(); i++) {
    auto open = history->open[i];
    auto close = history->close[i];
    int red = ((int)close - (int)open) >> ((int)sizeof(close) * 8);
    int green = ((int)open - (int)close) >> ((int)sizeof(close) * 8);
    gl::color(red,green,0);
    gl::drawSolidRect( Rectf(vec2(i - 1, open), vec2(i + 1, close)) );
}

This is how I tried to benchmark it. 这就是我尝试对其进行基准测试的方式。 Each run just shows 2ns. 每次运行仅显示2ns。 My main question to the community is this: 我对社区的主要问题是:

Can I actually make it faster by using a right shift and avoid a conditional branch? 我是否可以通过右移并避免有条件分支来使速度更快?

#include <benchmark/reporter.h>

static void BM_red_noWork(benchmark::State& state) {
    double open = (double)rand() / RAND_MAX;
    double close = (double)rand() / RAND_MAX;
    while (state.KeepRunning()) {
    }
}
BENCHMARK(BM_red_noWork);

static void BM_red_fast_work(benchmark::State& state) {
    double open = (double)rand() / RAND_MAX;
    double close = (double)rand() / RAND_MAX;
    while (state.KeepRunning()) {
        int red = ((int)open - (int)close) >> sizeof(int) - 1;
    }
}
BENCHMARK(BM_red_fast_work);

static void BM_red_slow_work(benchmark::State& state) {
    double open = (double)rand() / RAND_MAX;
    double close = (double)rand() / RAND_MAX;
    while (state.KeepRunning()) {
        int red = open > close ? 0 : 1;
    }
}
BENCHMARK(BM_red_slow_work);

Thanks! 谢谢!

As I stated in my comment, the compiler will do these optimizations for you. 正如我在评论中所述,编译器将为您进行这些优化。 Here is a minimal compilable example: 这是一个最小的可编译示例:

int main() {
  volatile int a = 42;
  if (a <= 0) {
    return 0;
  } else {
    return 1;
  }
}

The volatile is simply to prevent optimizations from "knowing" the value of a and instead it forces it to be read. volatile仅仅是为了防止从优化“知道”的价值a ,而是它迫使它来读取。

This was compiled with the command g++ -O3 -S test.cpp and it produces a file named test.s 这是使用命令g++ -O3 -S test.cpp编译的,并生成了一个名为test.s的文件。

Inside test.s is the assembly generated by the compiler (pardon AT&T syntax): 在test.s内部是编译器生成的程序集(赦免AT&T语法):

movl    $42, -4(%rsp)
movl    -4(%rsp), %eax
testl   %eax, %eax
setg    %al
movzbl  %al, %eax
ret

As you can see, it is branchless. 如您所见,它是无分支的。 It uses testl to set a flag if the number is <= 0 and then reads that value using setg , moves it back into the proper register, then finally it returns. 它使用testl设置一个标志,如果数量为<= 0 ,然后读取,使用值setg ,移回到适当的寄存器中,然后最终返回。

It should be noted, at this was adapted from your code. 应该注意的是,这是从您的代码改编而来的。 A much better way to write this is simply: 更好的写方法是:

int main() {
  volatile int a = 42;
  return a > 0;
}

It also generates the same assembly. 它还会生成相同的程序集。

This is likely to be better than anything readable you could write directly in C++. 这可能比您可以直接用C ++编写的任何可读性更好。 For instance your code (hopefully corrected for bit arithmetic errors): 例如您的代码(希望已针对位算术错误进行了更正):

int main() {
  volatile int a = 42;
  return ~(a >> (sizeof(int) * CHAR_BIT - 1)) & 1;
}

Compiles to: 编译为:

movl    $42, -4(%rsp)
movl    -4(%rsp), %eax
notl    %eax
shrl    $31, %eax
ret

Which is indeed, very slightly smaller. 确实的确很小。 But it's not significantly faster. 但这并不明显更快。 Especially not when you have a GL call right next to it. 特别是当您旁边有GL呼叫时,尤其如此。 I'd rather spend 1-3 additional cycles to get readable code, rather than have to scratch my head wondering what my coworker (or me from 6 months ago, which is essentially the same thing) did. 我宁愿花1-3个额外的周期来获得可读的代码,而不必费心思索我的同事(或者6个月前的我,基本上是同一回事)做了什么。

EDIT: I should be remarked that the compiler also optimized the bit arithmetic I wrote, because I wrote it less well than I could have. 编辑:我应该指出,编译器还优化了我编写的位算法,因为我编写的比我想象的要差。 The assembly is actually: (~a) >> 31 which is equivalent to the ~(a >> 31) & 1 that I wrote (at least in most implementations with an unsigned integer, see comments for details). 程序集实际上是: (~a) >> 31 ,它等效于我编写的~(a >> 31) & 1 (至少在大多数使用无符号整数的实现中,有关详细信息,请参见注释)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM