简体   繁体   English

确定整数是否在具有已知值集的两个整数(含)之间的最快方法

[英]Fastest way to determine if an integer is between two integers (inclusive) with known sets of values

Is there a faster way than x >= start && x <= end in C or C++ to test if an integer is between two integers?有没有比x >= start && x <= end在 C 或 C++ 中测试一个整数是否在两个整数之间更快的方法?

UPDATE : My specific platform is iOS.更新:我的特定平台是 iOS。 This is part of a box blur function that restricts pixels to a circle in a given square.这是框模糊功能的一部分,该功能将像素限制为给定正方形中的圆形。

UPDATE : After trying the accepted answer , I got an order of magnitude speedup on the one line of code over doing it the normal x >= start && x <= end way.更新:在尝试接受的答案后,我在一行代码上获得了一个数量级的加速,而不是按照正常的x >= start && x <= end方式进行。

UPDATE : Here is the after and before code with assembler from XCode:更新:这是使用 XCode 汇编器的前后代码:

NEW WAY新方法

// diff = (end - start) + 1
#define POINT_IN_RANGE_AND_INCREMENT(p, range) ((p++ - range.start) < range.diff)

Ltmp1313:
 ldr    r0, [sp, #176] @ 4-byte Reload
 ldr    r1, [sp, #164] @ 4-byte Reload
 ldr    r0, [r0]
 ldr    r1, [r1]
 sub.w  r0, r9, r0
 cmp    r0, r1
 blo    LBB44_30

OLD WAY老路

#define POINT_IN_RANGE_AND_INCREMENT(p, range) (p <= range.end && p++ >= range.start)

Ltmp1301:
 ldr    r1, [sp, #172] @ 4-byte Reload
 ldr    r1, [r1]
 cmp    r0, r1
 bls    LBB44_32
 mov    r6, r0
 b      LBB44_33
LBB44_32:
 ldr    r1, [sp, #188] @ 4-byte Reload
 adds   r6, r0, #1
Ltmp1302:
 ldr    r1, [r1]
 cmp    r0, r1
 bhs    LBB44_36

Pretty amazing how reducing or eliminating branching can provide such a dramatic speed up.令人惊讶的是,减少或消除分支可以提供如此显着的加速。

There's an old trick to do this with only one comparison/branch.有一个古老的技巧可以只用一个比较/分支来做到这一点。 Whether it'll really improve speed may be open to question, and even if it does, it's probably too little to notice or care about, but when you're only starting with two comparisons, the chances of a huge improvement are pretty remote.它是否真的会提高速度可能是值得商榷的,即使它确实如此,也可能很少注意到或关心,但是当你只开始进行两次比较时,巨大改进的机会非常渺茫。 The code looks like:代码如下:

// use a < for an inclusive lower bound and exclusive upper bound
// use <= for an inclusive lower bound and inclusive upper bound
// alternatively, if the upper bound is inclusive and you can pre-calculate
//  upper-lower, simply add + 1 to upper-lower and use the < operator.
    if ((unsigned)(number-lower) <= (upper-lower))
        in_range(number);

With a typical, modern computer (ie, anything using twos complement), the conversion to unsigned is really a nop -- just a change in how the same bits are viewed.对于典型的现代计算机(即,任何使用二进制补码的计算机),转换为无符号实际上是一个 nop —— 只是对相同位的看法的改变。

Note that in a typical case, you can pre-compute upper-lower outside a (presumed) loop, so that doesn't normally contribute any significant time.请注意,在典型情况下,您可以在(假定的)循环外预先计算upper-lower ,因此通常不会贡献任何大量时间。 Along with reducing the number of branch instructions, this also (generally) improves branch prediction.除了减少分支指令的数量外,这也(通常)改进了分支预测。 In this case, the same branch is taken whether the number is below the bottom end or above the top end of the range.在这种情况下,无论数字是低于范围的底端还是高于范围的顶端,都会采用相同的分支。

As to how this works, the basic idea is pretty simple: a negative number, when viewed as an unsigned number, will be larger than anything that started out as a positive number.至于它是如何工作的,基本思想非常简单:当被视为无符号数时,负数将大于任何以正数开始的数字。

In practice this method translates number and the interval to the point of origin and checks if number is in the interval [0, D] , where D = upper - lower .在实践中,此方法将number和区间转换为原点,并检查number是否在区间[0, D] ,其中D = upper - lower If number below lower bound: negative , and if above upper bound: larger than D .如果number低于下限:负数,如果高于上限:大于D

It's rare to be able to do significant optimizations to code on such a small scale.很少能对如此小规模的代码进行重大优化。 Big performance gains come from observing and modifying the code from a higher level.巨大的性能提升来自于从更高层次观察和修改代码。 You may be able to eliminate the need for the range test altogether, or only do O(n) of them instead of O(n^2).您可能能够完全消除对范围测试的需要,或者只执行 O(n) 而不是 O(n^2)。 You may be able to re-order the tests so that one side of the inequality is always implied.您可以对测试重新排序,以便始终隐含不等式的一侧。 Even if the algorithm is ideal, gains are more likely to come when you see how this code does the range test 10 million times and you find a way to batch them up and use SSE to do many tests in parallel.即使算法是理想的,当您看到此代码如何进行 1000 万次范围测试并找到一种方法将它们批量化并使用 SSE 并行执行许多测试时,更有可能获得收益。

It depends on how many times you want to perform the test over the same data.这取决于您要对相同数据执行多少次测试。

If you are performing the test a single time, there probably isn't a meaningful way to speed up the algorithm.如果您只执行一次测试,则可能没有一种有意义的方法来加速算法。

If you are doing this for a very finite set of values, then you could create a lookup table.如果您对一组非常有限的值执行此操作,那么您可以创建一个查找表。 Performing the indexing might be more expensive, but if you can fit the entire table in cache, then you can remove all branching from the code, which should speed things up.执行索引可能更昂贵,但如果您可以将整个表放入缓存中,那么您可以从代码中删除所有分支,这应该会加快速度。

For your data the lookup table would be 128^3 = 2,097,152.对于您的数据,查找表将为 128^3 = 2,097,152。 If you can control one of the three variables so you consider all instances where start = N at one time, then the size of the working set drops down to 128^2 = 16432 bytes, which should fit well in most modern caches.如果您可以控制三个变量之一,因此您一次考虑所有start = N实例,那么工作集的大小将下降到128^2 = 16432字节,这应该适合大多数现代缓存。

You would still have to benchmark the actual code to see if a branchless lookup table is sufficiently faster than the obvious comparisons.您仍然需要对实际代码进行基准测试,以查看无分支查找表是否比明显的比较快得多。

This answer is to report on a testing done with the accepted answer.这个答案是报告对接受的答案进行的测试。 I performed a closed range test on a large vector of sorted random integer and to my surprise the basic method of ( low <= num && num <= high) is in fact faster than the accepted answer above!我对排序的随机整数的大向量进行了封闭范围测试,令我惊讶的是 (low <= num && num <= high) 的基本方法实际上比上面接受的答案更快! Test was done on HP Pavilion g6 (AMD A6-3400APU with 6GB ram. Here's the core code used for testing:测试是在 HP Pavilion g6(AMD A6-3400APU,6GB 内存)上完成的。以下是用于测试的核心代码:

int num = rand();  // num to compare in consecutive ranges.
chrono::time_point<chrono::system_clock> start, end;
auto start = chrono::system_clock::now();

int inBetween1{ 0 };
for (int i = 1; i < MaxNum; ++i)
{
    if (randVec[i - 1] <= num && num <= randVec[i])
        ++inBetween1;
}
auto end = chrono::system_clock::now();
chrono::duration<double> elapsed_s1 = end - start;

compared with the following which is the accepted answer above:与以下接受的答案相比:

int inBetween2{ 0 };
for (int i = 1; i < MaxNum; ++i)
{
    if (static_cast<unsigned>(num - randVec[i - 1]) <= (randVec[i] - randVec[i - 1]))
        ++inBetween2;
}

Pay attention that randVec is a sorted vector.注意 randVec 是一个排序向量。 For any size of MaxNum the first method beats the second one on my machine!对于任何大小的 MaxNum,第一种方法在我的机器上胜过第二种方法!

For any variable range checking:对于任何变量范围检查:

if (x >= minx && x <= maxx) ...

It is faster to use bit operation:使用位操作更快:

if ( ((x - minx) | (maxx - x)) >= 0) ...

This will reduce two branches into one.这会将两个分支减少为一个。

If you care about type safe:如果您关心类型安全:

if ((int32_t)(((uint32_t)x - (uint32_t)minx) | ((uint32_t)maxx - (uint32_t)x)) > = 0) ...

You can combine more variable range checking together:您可以将更多变量范围检查组合在一起:

if (( (x - minx) | (maxx - x) | (y - miny) | (maxy - y) ) >= 0) ...

This will reduce 4 branches into 1.这会将 4 个分支减少为 1 个。

It is 3.4 times faster than the old one in gcc:它比 gcc 中的旧版本快 3.4 倍

在此处输入图片说明

I can tell you exactly why this would matter.我可以确切地告诉你为什么这很重要。 Imagine you're simulating an MMU.假设您正在模拟 MMU。 You are constantly having to make sure a given memory address exists with a given page set.您必须不断确保给定的内存地址存在于给定的页面集。 Those little bits add up very quickly because you're always saying这些小东西加起来很快,因为你总是说

  • Is this address valid?这个地址有效吗?
  • What page is this address part of?该地址属于哪个页面?
  • What rights does this page have?这个页面有什么权利?

Is it not possible to just perform a bitwise operation on the integer?不能只对整数执行按位运算吗?

Since it has to be between 0 and 128, if the 8th bit is set (2^7) it is 128 or more.由于它必须在 0 到 128 之间,如果设置了第 8 位(2^7),则它是 128 或更多。 The edge case will be a pain, though, since you want an inclusive comparison.但是,边缘情况会很痛苦,因为您需要进行包容性比较。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 确定值是否介于其他两个值之间的最有效方法 - Most efficient way to determine if value between two other values, inclusive 找出两组是否重叠的最快方法? - Fastest way to find out if two sets overlap? 确定字符是否属于一组已知字符的最快方法 C++ - Fastest Way to Determine if Character Belongs to a Set of Known Characters C++ 从整数列表中找到最小缺失整数的最快方法 - Fastest way to find smallest missing integer from list of integers 确定字符串包含实数还是整数值的最快方法 - Fastest way to determine whether a string contains a real or integer value 确定两个字符串是否由单个字符区别的最快方法 - Fastest way to determine if two strings differ by a single character 最快的函数,用于在无符号整数中将位设置为1 - Fastest function to set bits to one between two bits in an unsigned integer 找到集合并集的最快方法 - The fastest way to find union of sets 快速而优雅的已知整数值的单向映射 - Fast and elegant one-way mapping of known integer values 使用 192/256 位整数求和无符号 64 位整数向量的点积的最快方法? - Fastest way to sum dot product of vector of unsigned 64 bit integers using 192/256 bit integer?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM