转换为连续的if语句的无分支

Question

I'm stuck there trying to figure out how to convert the last two "if" statements of the following code to a branchless state. 我被困在那里试图找出如何将以下代码的最后两个“ if”语句转换为无分支状态。

int u, x, y;
x = rand() % 100 - 50;
y = rand() % 100 - 50;

u = rand() % 4;
if ( y > x) u = 5;
if (-y > x) u = 4;

Or, in case the above turns out to be too difficult, you can consider them as: 或者，如果上述结果太难了，则可以将它们视为：

if (x > 0) u = 5;
if (y > 0) u = 4;

I think that what gets me is the fact that those don't have an else catcher. 我认为让我着迷的是那些没有else捕手的事实。 If it was the case I could have probably adapted a variation of a branchless abs (or max / min ) function. 如果是这种情况，我可能可以改编无分支abs （或max / min ）函数的变体。

The rand() functions you see aren't part of the real code. 您看到的rand()函数不是真实代码的一部分。 I added them like this just to hint at the expected ranges that the variables x , y and u can possibly have at the time the two branches happen. 我这样添加它们只是为了暗示变量x ， y和u在两个分支发生时可能具有的预期范围。

Assembly machine code is allowed for the purpose. 允许使用组装机器代码。

EDIT: 编辑：

After a bit of braingrinding I managed to put together a working branchless version: 经过一番脑筋的磨合后，我设法整理出一个有效的无分支版本：

int u, x, y;
x = rand() % 100 - 50;
y = rand() % 100 - 50;

u = rand() % 4;
u += (4-u)*((unsigned int)(x+y) >> 31);
u += (5-u)*((unsigned int)(x-y) >> 31);

Unfortunately, due to the integer arithmetic involved, the original version with if statements turns out to be faster by a 30% range. 不幸的是，由于涉及整数运算，带有if语句的原始版本的速度提高了30％。

Compiler knows where the party is at. 编译器知道聚会的地点。

Answer 1

[All: this answer was written with the assumption that the calls on rand() were part of the problem. [全部：此答案是在假设对rand（）的调用是问题的一部分的前提下编写的。 I offer improvement below under that assumption. 在此假设下，我提供了以下改进。 OP belatedly clarifies he only used rand to tell us ranges (and presumably distribution) of the values of x and y. OP迟来澄清，他仅使用rand来告诉我们x和y值的范围（可能是分布）。 Unclear if he meant for the value for u, too. 也不清楚他是否也意味着你的价值。 Anyway, enjoy my improved answer to the problem he didn't really pose]. 无论如何，请享受我对他并未真正提出的问题的改进答案]。

I think you'd be better off recoding this as: 我认为您最好将其重新编码为：

int u, x, y;
x = rand() % 100 - 50;
y = rand() % 100 - 50;

if ( y > x) u = 5;
else if (-y > x) u = 4;
else u = rand() % 4;

This calls the last rand only 1/4 as often as OP's original code. 这仅将最后一个兰特称为OP原始代码的1/4。 Since I assume rand (and the divides) are much more expensive than compare-and-branch, this would be a significant savings. 由于我假设兰德（和分之一）比比较分支贵得多，所以这将是一笔可观的节省。

If your rand generator produces a lot of truly random bits (eg 16) on each call as it should, you can call it just once (I've assumed rand is more expensive than divide, YMMV): 如果您的rand生成器在每次调用时都会产生很多真正的随机位（例如16），则可以调用一次（我假设rand比YMMV的除法更昂贵）：

int u, x, y, t;
t = rand() ;
u = t % 4;
t = t >> 2;
x = t % 100 - 50;
y = ( t / 100 ) %100 - 50;

if ( y > x) u = 5;
else if (-y > x) u = 4;

I think that the rand function in the MS C library is not good enough for this if you want really random values. 我认为如果您想要真正的随机值，MS C库中的rand函数对此还不够好。 I had to code my own; 我必须自己编写代码； turned out faster anyway. 反正更快。

You might also get rid of the divide, by using multiplication by a reciprocal (untested): 您还可以通过乘以倒数（未经测试）来消除除法：

int u, x, y;
unsigned int t;
unsigned long t2;
t = rand() ;
u = t % 4;

{ // Compute value of x * 2^32 in a long by multiplying.
  // The (unsigned int) term below should be folded into a single constant at compile time.
  // The remaining multiply can be done by one machine instruction
  // (typically 32bits * 32bits --> 64bits) widely found in processors.
  // The "4" has the same effect as the t = t >> 2 in the previous version
  t2 = ( t * ((unsigned int)1./(4.*100.)*(1<<32));
}
x = (t2>>32)-50; // take the upper word (if compiler won't, do this in assembler)
{ // compute y from the fractional remainder of the above multiply,
  // which is sitting in the lower 32 bits of the t2 product
  y = ( t2 mod (1<<32) ) * (unsigned int)(100.*(1<<32));
}

if ( y > x) u = 5;
else if (-y > x) u = 4;

If your compiler won't produce the "right" instructions, it should be straightforward to write assembly code to do this. 如果您的编译器不会产生“正确”的指令，那么编写汇编代码来完成此操作应该很简单。

Answer 2

Some tricks using arrays indices, they may be quite fast if the compiler/CPU has one-step instructions to convert comparison results to 0-1 values (eg x86's "sete" and similar). 使用数组索引的一些技巧，如果编译器/ CPU具有一步指令将比较结果转换为0-1值（例如x86的“ sete”等），它们可能会很快。

int ycpx[3];

/* ... */
ycpx[0] = 4;
ycpx[1] = u;
ycpx[2] = 5;
u = ycpx[1 - (-y <= x) + (y > x)];

Alternate form 替代形式

int v1[2];
int v2[2];

/* ... */
v1[0] = u;
v1[1] = 5;
v2[1] = 4;
v2[0] = v1[y > x];
u = v2[-y > x];

Almost unreadable... 几乎不可读...

NOTE: In both cases the initialization of array elements containing 4 and 5 may be included in declaration and arrays may be made static if reentrancy is not a problem for you. 注意：在这两种情况下，包含4和5的数组元素的初始化都可能包含在声明中，并且如果重新输入对您来说不是问题，则可以将数组设为静态。

转换为连续的if语句的无分支

问题描述

2 个解决方案

解决方案1
2 2015-02-15 16:58:02

解决方案2
0 2015-02-16 22:53:53

转换为连续的if语句的无分支

问题描述

2 个解决方案

解决方案1 2 2015-02-15 16:58:02

解决方案2 0 2015-02-16 22:53:53

解决方案1
2 2015-02-15 16:58:02

解决方案2
0 2015-02-16 22:53:53