简体   繁体   English

转换为连续的if语句的无分支

[英]Conversion to branchless of consecutive if statements

I'm stuck there trying to figure out how to convert the last two "if" statements of the following code to a branchless state. 我被困在那里试图找出如何将以下代码的最后两个“ if”语句转换为无分支状态。

int u, x, y;
x = rand() % 100 - 50;
y = rand() % 100 - 50;

u = rand() % 4;
if ( y > x) u = 5;
if (-y > x) u = 4;

Or, in case the above turns out to be too difficult, you can consider them as: 或者,如果上述结果太难了,则可以将它们视为:

if (x > 0) u = 5;
if (y > 0) u = 4;

I think that what gets me is the fact that those don't have an else catcher. 我认为让我着迷的是那些没有else捕手的事实。 If it was the case I could have probably adapted a variation of a branchless abs (or max / min ) function. 如果是这种情况,我可能可以改编无分支abs (或max / min )函数的变体。

The rand() functions you see aren't part of the real code. 您看到的rand()函数不是真实代码的一部分。 I added them like this just to hint at the expected ranges that the variables x , y and u can possibly have at the time the two branches happen. 我这样添加它们只是为了暗示变量xyu在两个分支发生时可能具有的预期范围。

Assembly machine code is allowed for the purpose. 允许使用组装机器代码。

EDIT: 编辑:

After a bit of braingrinding I managed to put together a working branchless version: 经过一番脑筋的磨合后,我设法整理出一个有效的无分支版本:

int u, x, y;
x = rand() % 100 - 50;
y = rand() % 100 - 50;

u = rand() % 4;
u += (4-u)*((unsigned int)(x+y) >> 31);
u += (5-u)*((unsigned int)(x-y) >> 31);

Unfortunately, due to the integer arithmetic involved, the original version with if statements turns out to be faster by a 30% range. 不幸的是,由于涉及整数运算,带有if语句的原始版本的速度提高了30%。

Compiler knows where the party is at. 编译器知道聚会的地点。

[All: this answer was written with the assumption that the calls on rand() were part of the problem. [全部:此答案是在假设对rand()的调用是问题的一部分的前提下编写的。 I offer improvement below under that assumption. 在此假设下,我提供了以下改进。 OP belatedly clarifies he only used rand to tell us ranges (and presumably distribution) of the values of x and y. OP迟来澄清,他仅使用rand来告诉我们x和y值的范围(可能是分布)。 Unclear if he meant for the value for u, too. 也不清楚他是否也意味着你的价值。 Anyway, enjoy my improved answer to the problem he didn't really pose]. 无论如何,请享受我对他并未真正提出的问题的改进答案]。

I think you'd be better off recoding this as: 我认为您最好将其重新编码为:

int u, x, y;
x = rand() % 100 - 50;
y = rand() % 100 - 50;

if ( y > x) u = 5;
else if (-y > x) u = 4;
else u = rand() % 4;

This calls the last rand only 1/4 as often as OP's original code. 这仅将最后一个兰特称为OP原始代码的1/4。 Since I assume rand (and the divides) are much more expensive than compare-and-branch, this would be a significant savings. 由于我假设兰德(和分之一)比比较分支贵得多,所以这将是一笔可观的节省。

If your rand generator produces a lot of truly random bits (eg 16) on each call as it should, you can call it just once (I've assumed rand is more expensive than divide, YMMV): 如果您的rand生成器在每次调用时都会产生很多真正的随机位(例如16),则可以调用一次(我假设rand比YMMV的除法更昂贵):

int u, x, y, t;
t = rand() ;
u = t % 4;
t = t >> 2;
x = t % 100 - 50;
y = ( t / 100 ) %100 - 50;

if ( y > x) u = 5;
else if (-y > x) u = 4;

I think that the rand function in the MS C library is not good enough for this if you want really random values. 我认为如果您想要真正的随机值,MS C库中的rand函数对此还不够好。 I had to code my own; 我必须自己编写代码; turned out faster anyway. 反正更快。

You might also get rid of the divide, by using multiplication by a reciprocal (untested): 您还可以通过乘以倒数(未经测试)来消除除法:

int u, x, y;
unsigned int t;
unsigned long t2;
t = rand() ;
u = t % 4;

{ // Compute value of x * 2^32 in a long by multiplying.
  // The (unsigned int) term below should be folded into a single constant at compile time.
  // The remaining multiply can be done by one machine instruction
  // (typically 32bits * 32bits --> 64bits) widely found in processors.
  // The "4" has the same effect as the t = t >> 2 in the previous version
  t2 = ( t * ((unsigned int)1./(4.*100.)*(1<<32));
}
x = (t2>>32)-50; // take the upper word (if compiler won't, do this in assembler)
{ // compute y from the fractional remainder of the above multiply,
  // which is sitting in the lower 32 bits of the t2 product
  y = ( t2 mod (1<<32) ) * (unsigned int)(100.*(1<<32));
}

if ( y > x) u = 5;
else if (-y > x) u = 4;

If your compiler won't produce the "right" instructions, it should be straightforward to write assembly code to do this. 如果您的编译器不会产生“正确”的指令,那么编写汇编代码来完成此操作应该很简单。

Some tricks using arrays indices, they may be quite fast if the compiler/CPU has one-step instructions to convert comparison results to 0-1 values (eg x86's "sete" and similar). 使用数组索引的一些技巧,如果编译器/ CPU具有一步指令将比较结果转换为0-1值(例如x86的“ sete”等),它们可能会很快。

int ycpx[3];

/* ... */
ycpx[0] = 4;
ycpx[1] = u;
ycpx[2] = 5;
u = ycpx[1 - (-y <= x) + (y > x)];

Alternate form 替代形式

int v1[2];
int v2[2];

/* ... */
v1[0] = u;
v1[1] = 5;
v2[1] = 4;
v2[0] = v1[y > x];
u = v2[-y > x];

Almost unreadable... 几乎不可读...

NOTE: In both cases the initialization of array elements containing 4 and 5 may be included in declaration and arrays may be made static if reentrancy is not a problem for you. 注意:在这两种情况下,包含4和5的数组元素的初始化都可能包含在声明中,并且如果重新输入对您来说不是问题,则可以将数组设为静态。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM