简体   繁体   English

无分支溢出处理

[英]Branchless Overflow Handling

I'm trying to create a type of safe buffer that automatically handles overflow without any branching. 我正在尝试创建一种安全缓冲区,该缓冲区自动处理溢出而没有任何分支。 The buffer size is a power of two and shall only have valid positive (ie not including zero) indices. 缓冲区大小是2的幂,并且只能有有效的正(即不包括零)索引。 It also allows checked removal, which is removal at a given index if the element stored at that index is equal to a search key. 它还允许选中删除,即在给定索引处删除,如果存储在该索引处的元素等于搜索键。

I was essentially going for something like this 我本质上是为了这样的事情

Element *buffer[256];

inline void buffer_insert(size_t index, Element *elem){
  buffer[index < 256 && index] = elem;
}

//Optional: checked insert to prevent overwrite. Will only insert
//if the buffer holds NULL at index.
inline void buffer_checkedInsert(size_t index, Element * elem){
  buffer[index && !buffer[index < 256 && index]] = elem;  
}

inline void buffer_checkedRemove(size_t index, Element *elem){
  buffer[0] = NULL; //Maybe useful if buffer[0] stores elem
  buffer[((elem == buffer[index < 256 && index)) && index] = NULL;
}

So I basically want to access index 0 whenever the index passed in is out of bounds, as buffer[0] is not a valid buffer index. 因此,我基本上想在传入的索引超出范围时访问索引0,因为buffer[0]不是有效的缓冲区索引。 And I also want to access index 0 whenever the element to be removed is not equal to the element that is passed into the removal, and I might want to also access index 0 if the buffer contains something at index. 而且,我也想在要删除的元素与传递到删除中的元素不相等时访问索引0,并且如果缓冲区中的某些内容在索引处,我可能还希望访问索引0。

My questions are: 我的问题是:

  • Is what I have really branchless? 我真的没有分支吗? Because if the C compiler decides to use short-circuiting on &&, the code might get branched. 因为如果C编译器决定对&&使用短路,则代码可能会分支。
  • If && causes branching, is there an alternative that has the same behavior in this case that does not involve branching? 如果&&引起分支,那么在这种情况下,是否存在具有相同行为但不涉及分支的替代方法?
  • Can this be faster than a basic overflow check? 这可以比基本的溢出检查更快吗? Or could the C compiler somehow give a branchless version of if(index < 256) buffer[index] = elem ? 还是C编译器可以以某种方式给出if(index < 256) buffer[index] = elem的无分支版本?

Is what I have really branchless? 我真的没有分支吗? Because if the C compiler decides to use short-circuiting on &&, the code might get branched. 因为如果C编译器决定对&&使用短路,则代码可能会分支。

Maybe. 也许。 The compiler might be clever enough to emit branchless machine code in these cases, but you cannot rely on it. 在这种情况下,编译器可能足够聪明,可以发出无分支机器代码,但是您不能依靠它。

If && causes branching, is there an alternative that has the same behavior in this case that does not involve branching? 如果&&引起分支,那么在这种情况下,是否存在具有相同行为但不涉及分支的替代方法?

Your question is a bit confused. 您的问题有点困惑。 The fact that a compiler may emit branching code to implement the && operation follows from the defined behavior of that operation. 编译器可以发出分支代码以实现&&操作的事实来自该操作的定义行为。 Any alternative that had the same behavior must afford the same possibility of branching. 具有相同行为的任何替代方案都必须提供相同的分支可能性。

On the other hand, if you mean to ask whether there is an alternative that computes the same result in all cases, then yes, you can rewrite those expressions to do so without the possibility of branching. 另一方面,如果您要询问是否存在在所有情况下都可以计算相同结果的替代方法,那么可以,您可以重写这些表达式而不必分支。 For instance, you could use either the & or the * operator like so: 例如,您可以使用&*运算符,如下所示:

buffer[(index < 256) & (index != 0)] = elem;

Or, you could implement the behavior you actually want: 或者,您可以实现您实际想要的行为:

buffer[(index < 256) * index] = elem;

There's no reason to think that the compiler would emit a branch instruction for either of those computations; 没有理由认为编译器会为这些计算之一发出分支指令。 if it did, that would probably be because it thinks that would provide a performance improvement on the target architecture. 如果确实如此,那可能是因为它认为这将为目标体系结构带来性能上的改进。

Can this be faster than a basic overflow check? 这可以比基本的溢出检查更快吗? Or could the C compiler somehow give a branchless version of if(index < 256) buffer[index] = elem? 还是C编译器可以以某种方式给出if(index <256)buffer [index] = elem的无分支版本?

The branchless versions certainly can be faster. 该网点的版本肯定可以更快。 They are most likely to be observably faster on workloads where the (non-)branch is executed a lot, and there is no easily-discernible pattern to which alternative is taken. 在执行(非)分支的工作负载上,它们最有可能观察到更快的速度,并且不存在易于辨别的替代方案。 But if the (non-)branching mostly follows a regular pattern, and especially if it almost always goes one way, then the CPU's branch prediction unit could make an ordinary validity check at least as fast as the branchless assignments. 但是,如果(非)分支大多遵循规则模式,尤其是如果它几乎总是沿一种方式运行,则CPU的分支预测单元可以进行普通有效性检查,其速度至少与无分支分配一样快。

Ultimately, there's no good reason to worry about this without benchmarking the actual performance of your code on real data, or a good facsimile thereof. 最终,没有充分的理由担心此事,而不用对实际数据的代码实际性能进行基准测试,或者对其进行良好的传真测试。 The result is likely to be data dependent, and whether it matters at all depends on how much of the program's run time is spent in the functions you ask about. 结果可能与数据有关,并且它是否重要完全取决于您要询问的功能花费了程序运行时间的多少。 Until and unless you have a good benchmark demanding otherwise, you should code for clarity and maintainability. 除非您有良好的基准要求,否则除非您有其他要求,否则您应该编写代码以保持清晰度和可维护性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM