简体   繁体   English

快速查找两组数字之间的交集,一组由按位条件定义,另一组由算术条件定义

[英]Fast way to find a intersection between two sets of numbers, one defined by a bitwise condition and another by an arithmetic condition

This is probably well covered ground, but I'm ignorant on the subject so I'll be using amateurish terminology. 这可能是很好的覆盖,但我对这个问题一无所知,所以我将使用业余术语。 Let's say I'm messing around with some set of conditions that each define a non-zero set of numbers within an int, let's just say 8-bit int. 假设我正在搞乱一些条件,每个条件都在int中定义一组非零数字,我们只说8位int。 So for a bitwise one, I may have this: 所以对于一个按位,我可能有这个:

11XX00XX

Saying that I want all bytes that have 1s where there are 1s, 0s where there are 0s, and don't care about the Xs. 说我希望所有字节都有1s,其中有1s,0s有0,并且不关心Xs。 So 11110000 or 11000010 fulfills this, but 01110000 does not. 所以11110000或11000010实现了这一点,但01110000却没有。 Easy enough, right? 够容易吧? For arithmetic conditions, I can only imagine there being some use of ==, !=, >=, >, <=, or < with comparison with a constant number. 对于算术条件,我只能想象使用==,!=,> =,>,<=或<与常数进行比较。 So I may say: 所以我可以说:

X > 16

So any number greater than 16 (00010000). 所以任何大于16的数字(00010000)。 What if I want to find all numbers that are in both of those above example sets? 如果我想查找上述示例集中的所有数字,该怎么办? I can tell by looking at it that any numbers ending in 100XX will fit the requirements, so the bitwise part of the interseection includes 11X100XX. 我可以通过观察它看出任何以100XX结尾的数字都符合要求,因此交叉点的按位部分包括11X100XX。 Then I have to include the region 111X00XX to fill the rest of the range above it, too. 然后我必须包括区域111X00XX以填充其上方的其余范围。 Right? 对? Although I think for other cases, it wouldn't turn out so neatly, correct? 虽然我认为对于其他情况,它不会如此整洁,正确吗? Anyway, what is the algorithm behind this, for any of those arithmetic conditions vs any possible of those bitwise ones. 无论如何,对于任何这些算术条件与任何可能的那些按位算法相比,这背后的算法是什么。 Surely there must be a general case! 当然必须有一般情况!

Assuming there is one, and it's probably something obvious, what if things get more complicated? 假设有一个,它可能是显而易见的,如果事情变得更复杂怎么办? What if my bitwise requirement becomes: 如果我的按位要求变为:

11AA00XX

Where anything marked with A must be the same. 标有A的任何东西必须相同。 So 110000XX or 111100XX, but not 111000XX. 所以110000XX或111100XX,但不是111000XX。 For any number of same bit "types" (A, B, C, etc) in any number and at any positions, what is the optimal way of solving the intersection with some arithmetic comparison? 对于任意数量和任何位置的任意数量的相同位“类型”(A,B,C等),通过某种算术比较求解交点的最佳方法是什么? Is there one? 有吗?

I'm considering these bitwise operations to be sort of a single comparison operation/branch, just like how the arithmetic is setup. 我正在考虑这些按位运算是一种单一的比较运算/分支,就像算法的设置一样。 So maybe one is all the constants that, when some byte B 01110000 is ANDed with them, result in 00110000. So that region of constants, which is what my "bitwise condition" would be, would be X011XXXX, since X011XXXX AND 01110000 = 00110000. All of my "bitwise conditions" are formed by that reversal of an op like AND, OR, and XOR. 所以也许一个是所有常数,当某个字节B 01110000与它们进行AND运算时,会产生00110000.因此,常数区域,即我的“按位条件”,将是X011XXXX,因为X011XXXX AND 01110000 = 00110000我所有的“按位条件”都是通过像AND,OR和XOR这样的操作的反转形成的。 Not sure if something like NAND would be included or not. 不确定是否会包含像NAND这样的东西。 This may limit what conditions are actually possible, maybe? 这可能会限制实际可能的条件,也许? If it does, then I don't care about those types of conditions. 如果是这样,那么我不关心那些类型的条件。

Sorry for the meandering attempt at an explanation. 很抱歉蜿蜒的尝试解释。 Is there a name for what I'm doing? 我正在做什么名字? It seems like it'd be well tread ground in CS, so a name could lead me to some nice reading on the subject. 看起来它在CS中已经很好用了,所以一个名字可以让我对这个主题进行一些很好的阅读。 But I'm mostly just looking for a good algorithm to solve this. 但我主要只是寻找一个好的算法来解决这个问题。 I am going to end up with using more than 2 things in the intersection (potentially dozens or many many more), so a way to solve it that scales well would be a huge plus. 我最终会在十字路口使用两个以上的东西(可能有几十个或更多),所以解决它的方法可以很好地扩展。

Bitwise 按位

Ok, so we look at bitwise operations, as that is the most efficient way of doing what you want. 好的,所以我们看一下按位操作,因为这是做你想要的最有效的方法。 For clarity (and reference), bitwise values converted to decimal are 为清楚起见(和参考),转换为十进制的按位值是

00000001 =   1
00000010 =   2
00000100 =   4
00001000 =   8
00010000 =  16
00100000 =  32
01000000 =  64
10000000 = 128

Now, given a bit pattern of 11XX00XX on the variable x we would perform the following checks: 现在,给定变量x上的11XX00XX位模式,我们将执行以下检查:

x AND 128 == true
x AND 64  == true
x AND 8   == false
x AND 4   == false

If all those conditions are true, then the value matches the pattern. 如果所有这些条件都为真,则该值与模式匹配。 Essentially, you are checking the following conditions: 基本上,您正在检查以下条件:

1XXXXXXX AND 10000000 == true
X1XXXXXX AND 01000000 == true
XXXX0XXX AND 00001000 == false
XXXXX0XX AND 00000100 == false

To put that together in programming language parlance (I'll use C#), you'd look for 把它放在编程语言的说法中(我将使用C#),你会寻找

if ((x && 128) && (x && 64) && !(x && 8) && !(x && 4))  
{
    // We have a match
}

For the more complicated bit pattern of 11AA00XX, you would add the following condition: 对于更复杂的11AA00XX位模式,您将添加以下条件:

NOT ((x AND 32) XOR (x AND 16)) == true

What this does is first check x AND 32 , returning either a 0 or 1 based on the value of that bit in x . 这样做首先检查x AND 32 ,根据x中该位的值返回0或1。 Then, it makes the same check on the other bit, x AND 16 . 然后,它对另一个位x AND 16进行相同的检查。 The XOR operation checks for the difference in bits, returning a 1 if the bits are DIFFERENT and a 0 if the bits are the same. XOR操作检查位的差异,如果位是不同的则返回1,如果位相同则返回0。 From there, since we want to return a 1 if they are the same, we NOT the whole clause. 从那里开始,因为我们想要返回1,如果它们是相同的,我们NOT整个条款。 This will return a 1 if the bits are the same. 如果位相同,则返回1。


Arithmetically 算术

On the arithmetic side, you'd look at using a combination of division and modulus operations to isolate the bit in question. 在算术方面,您将看到使用除法和模运算的组合来隔离有问题的位。 To work with division, you start by finding the highest possible power of two that the number can be divided by. 要与除法一起工作,首先要找到数字可以除以的最大二次幂。 In other words, if you have x=65 , the highest power of two is 64. 换句话说,如果你有x=65 ,那么2的最高功率是64。

Once you've done the division, you then use modulus to take the remainder after division. 完成除法之后,然后使用模数来取除除法后的余数。 As in the example above, given x=65 , x MOD 64 == 1 . 如上例所示,给定x=65x MOD 64 == 1 With that number, you repeat what you did before, finding the highest power of two, and continuing until the modulus returns 0. 使用该数字,您可以重复之前的操作,找到最高的2的幂,并继续直到模数返回0。

Extending a bit on saluce's answer: 在saluce的答案上稍微扩展一下:

Bit testing 比特测试

You can build test patterns, so you don't need to check each bit individually (testing the whole number is faster than testing one-bit-a-time, especially that the on-bit-a-time test the whole number just as well): 您可以构建测试模式,因此您不需要单独检查每个位(测试整数比测试一位一次更快,特别是一次性测试整个数字就像好):

testOnes = 128 & 64 // the bits we want to be 1
testZeros = ~(8 & 4) // the bits we want to be 0, inverted

Then perform your test this way: 然后以这种方式执行测试:

if (!(~(x & testOnes) & testOnes) &&
    !(~(~x | testZeros))) {
  /* match! */
}

Logic explained : 逻辑解释说

First of all, in both testOnes and testZeros you have the bits-in-interest set to 1, the rest to 0. 首先,在testOnestestZeros您将感兴趣的位设置为1,其余为0。

testOnes testZeros
11000000 11110011

Then x & testOnes will zero out all bits we don't want to test for being ones (note the difference between & and && : & performs the logical AND operation bitwise, whereas && is the logical AND on the whole number). 然后x & testOnes将把我们不想测试的所有位清零为零(注意&&&之间的区别: &按位执行逻辑AND运算,而&&是整数的逻辑AND )。

testOnes
11000000
x        x & testOnes
11110000 11000000
11000010 11000000
01010100 01000000

Now at most the bits we are testing for being 1 can be 1, but we don't know if all of them are 1s: by inverting the result ( ~(x & testOnes) ), we get all numbers we don't care about being 1s and the bits we would like to test are either 0 (if they were 1) or 1 (if they were 0). 现在最多我们测试为1的位可以是1,但是我们不知道它们是否都是1:通过反转结果( ~(x & testOnes) ),我们得到所有我们不关心的数字关于1和我们想要测试的位是0(如果它们是1)或1(如果它们是0)。

testOnes
11000000
x        ~(x & testOnes)
11110000 00111111
11000010 00111111
01010100 10111111

By bitwise- AND -ing it with testOnes we get 0 if the bits-in-interest were all 1s in x , and non-zero otherwise. 通过bitwise- AND用-ing它testOnes我们得到0,如果位,在利益均全1 x ,和非否则为零。

testOnes
11000000
x        ~(x & testOnes) & testOnes
11110000 00000000
11000010 00000000
01010100 10000000

At this point we have: 0 if all bits we wanted to test for 1 were actually 1s, and non-0 otherwise, so we perform a logical NOT to turn the 0 into true and the non-0 into false . 此时我们有:0如果我们想要测试1的所有位实际上都是1,否则非0,所以我们执行逻辑NOT以将0变为true而将非0变为false

x        !(~(x & testOnes) & testOnes)
11110000 true
11000010 true
01010100 false

The test for zero-bits is similar, but we need to use bitwise- OR ( | ), instead of bitwise- AND ( & ). 零位测试类似,但我们需要使用按位 - OR| ),而不是按位 - AND& )。 First, we flip x , so the should-be-0 bits become should-be-1, then the OR -ing turns all non-interest bits into 1, while keeping the rest; 首先,我们翻转x ,所以should-be-0位变为should-be-1,然后OR -ing将所有非兴趣位变为1,同时保持其余位; so at this point we have all-1s if the should-be 0 bits in x were indeed 0, and non-all-1s, otherwise, so we flip the result again to get 0 in the first case and non-0 in the second. 所以在这一点上,如果x中的0位实际上是0,那么我们有全1,而非全1,否则,所以我们再次翻转结果,在第一种情况下获得0而在非0中第二。 Then we apply logical NOT ( ! ) to convert the result to true (first case) or false (second case). 然后我们应用逻辑NOT! )将结果转换为true (第一种情况)或false (第二种情况)。

testZeros
11110011
x        ~x       ~x | testZeros ~(~x | testZeros) !(~(~x | testZeros))
11110000 00001111 11111111       00000000          true
11000010 00111101 11111111       00000000          true
01010100 10101011 11111011       00000100          false

Note: You need to realize that we have performed 4 operations for each test, so 8 total. 注意:您需要意识到我们已经为每个测试执行了4次操作,因此共计8次。 Depending on the number of bits you want to test, this might still be less than checking each bit individually. 根据您要测试的位数,这可能仍然小于单独检查每个位。

Arithmetic testing 算术测试

Testing for equality/difference is easy: XOR the number with the tested one -- you get 0 if all bits were equal (thus the numbers were equal), non-0 if at least one bit was different (thus the numbers were different). 对等式/差异的测试很容易:对测试的数字进行XOR - 如果所有位相等则得到0(因此数字相等),如果至少有一位不同则得到0(因此数字不同) 。 (Applying NOT turns the equal test result true , differences to false .) (应用NOT将相等的测试结果设为true ,将差异设为false 。)

To test for unequality, however, you are out of luck most of the time, at least as it applies to logical operations. 然而,为了测试不平等,大多数时候你都不幸运,至少它适用于逻辑运算。 You are correct that checking for powers-of-2 (eg 16 in your question), can be done with logical operations (bitwise- AND and test for 0), but for numbers that are not powers-of-2, this is not so easy. 你是正确的,检查的权力,以2的(例如,在你的问题16),可与逻辑运算来完成(bitwise- AND和测试0),但对于某些数字不是掌权者的-2,这是不是太简单。 As an example, let's test for x>5 : the pattern is 00000101, so how do you test? 例如,让我们测试x>5 :模式是00000101,那么你如何测试? The number is greater if it has a 1 in the fist 5 most-significant-bits, but 6 (00000110) is also larger with all first five bits being 0. 如果它在第5个最高有效位中具有1,则该数字更大,但是6(00000110)也更大,所有前5位为0。

The best you could do is test if the number is at least twice as large as the highest power-of-2 in the number (4 for 5 in the above example). 您可以做的最好的事情是测试该数字是否至少是该数字中2的最大功率的两倍(上例中为4为5)。 If yes, then it is larger than the original; 如果是,那么它比原来大; otherwise, it has to be at least as much as the highest power-of-2 in the number, and then perform the same test on the less-significant bits. 否则,它必须至少与数字中最高2的幂一样多,然后对不太重要的位执行相同的测试。 As you can see, the number of operations are dynamic based on the number of 1 bits in the test number. 如您所见,根据测试编号中的1位数,操作数是动态的。

Linked bits 链接位

Here, XOR is your friend: for two bits XOR yields 0 if they are the same, and 1 otherwise. 在这里, XOR是你的朋友:对于两位XOR如果相同则产生0,否则为1。

I do not know of a simple way to perform the test, but the following algorithm should help: 我不知道执行测试的简单方法,但以下算法应该有所帮助:

Assume you need bits b1 , ..., bn to be the same (all 1s or all 0s), then zero-out all other bits (see logical- AND above), then isolate each bit in the test pattern, then line them up at the same position (let's make it the least-significant-bit for convenience). 假设您需要位b1 ,..., bn相同(全1或全0),然后将所有其他位清零(参见逻辑 - AND上面的),然后隔离测试模式中的每个位,然后排列它们在相同的位置(为方便起见,让它成为最不重要的位置)。 Then XOR -ing two of them then XOR -ing the third with the result, etc. will yield an even number at every odd step, odd number at every even step if the bits were the same in the original number. 然后对它们中的两个进行XOR ,然后对结果等的第三个进行XOR ,将在每个奇数步骤产生偶数,如果原始数字中的位相同,则在每个偶数步骤产生奇数。 You will need to test at every step as testing only the final result can be incorrect for a larger number of linked-bits-to-be-tested. 您将需要在每个步骤进行测试,因为测试只有最终结果可能不正确,因为大量的待测链接位。

testLinks
00110000
x        x & testLinks
11110000 00110000
11000010 00000000
01010100 00010000

x        x's bits isolated isolated bits shifted
11110000 00100000          00000001
         00010000          00000001
11000010 00000000          00000000
         00000000          00000000
01010100 00000000          00000000
         00010000          00000001

x        x's bits XOR'd result
11110000 00000000       true (1st round, even result)
11000010 00000000       true (1st round, even result)
01010100 00000001       false (1st round, odd result)

Note: In C-like languages the XOR operator is ^ . 注意:在类C语言中, XOR运算符是^

Note: How to line bits to the same position? 注意:如何将位排到同一位置? Bit-shifting. 比特移位。 Shift-left ( << ) shifts all bits to the left, "losing" the most significant-bit and "introducing" 0 to the least-significant-bit, essentially multiplying the number by 2; Shift-left( << )将所有位向左移位,“丢失”最高有效位并将“0”引入最低有效位,基本上将该数字乘以2; shift-right ( >> ) operates similarly, shifting bits to the right, essentially integer-dividing the number by 2, however, it "introduces" the same bit to the most-significant-bit that was there already (thus keeping negative numbers negative). shift-right( >> )操作类似,向右移位,基本上整数除以2,但是,它将相同的位“引入”到已存在的最高位(因此保持负数)负)。

TLDR saluce's answer: TLDR saluce的答案:

Bitwise checks consider individual bits separately, and arithmetic checks consider all the bits together. 按位检查分别考虑各个位,算术检查将所有位一起考虑。 True enough, they coincide for powers of 2, but not for any arbitrary number. 确实如此,它们与2的幂相吻合,但不适用于任意数字。

So if you have both, you will need to implement both sets of checks. 因此,如果您同时拥有这两者,则需要实施两组检查。

The space of all possible values of a 32-bit int is a bit big to store, so you'll have to check them all each time. 32位int的所有可能值的空间有点大,因此您每次都必须检查它们。 Just make sure you're using short-circuits to eliminate duplicative checks like x > 5 || 只需确保使用短路来消除重复检查,例如x> 5 || x > 3. x> 3。

You've defined a decent DSL for specifying the mask. 您已经定义了一个适合指定掩码的DSL。 I would write a parser that reads that mask and performs operations specific to each unique character. 我会写一个解析器来读取该掩码并执行特定于每个唯一字符的操作。

AABBB110 = mask AABBB110 =面具

Step 1: extract all unique characters in to an array [01AB]. 步骤1:将所有唯一字符提取到数组[01AB]中。 You can omit 'X', as no operation is needed. 您可以省略'X',因为不需要任何操作。

Step 2: iterate through that array, processing your text mask into separate bit masks, one for each unique character, replacing the bit at that character placement with 1 and all others with 0. 第2步:遍历该数组,将文本掩码处理为单独的位掩码,每个唯一字符对应一个,将该字符位置的位替换为1,将所有其他位替换为0。

Mask_0 = 00000001 = 0x01
Mask_1 = 00000110 = 0x06
Mask_A = 11000000 = 0xC0
Mask_B = 00111000 = 0x38

Step 3: Pass each mask to its appropriate function as defined below. 步骤3:将每个掩模传递到下面定义的适当功能。

boolean mask_zero(byte data, byte mask) {
  return (data & mask) == 0;
}

boolean mask_one(byte data, byte mask) {
  return (data & mask) == mask;
}

boolean mask_same(byte data, byte mask) {
  byte masked=data & mask;
  return (masked==0) || (masked==mask);
} 

What format do you want the result set in? 您希望结果集的格式是什么? Both the arithmetic set (let's call this A) and the bitwise set (let's call this B) have the both the advantage of being quickly testable, and the advantage of being easily iterable. 算术集(让我们称之为A)和按位集(让我们称之为B)都具有可快速测试的优点,以及易于迭代的优点。 But each of those kinds of definition can define things that the other can't, so the intersection of them needs to be something else entirely. 但是这些定义中的每一种都可以定义另一种不能定义的东西,因此它们的交集需要完全不同。

What I would do is handle testing and iteration separately. 我要做的是分别处理测试和迭代。 An easily-testable definition can be created easily by converting both sets to arbitrary mathematical expressions (the bitwise set can be converted to a few bitwise operations, as other posters have described) by simply using logical "and". 通过简单地使用逻辑“和”,可以通过将两个集合转换为任意数学表达式(按位集合可以转换为几个按位操作,如其他海报所描述的)来轻松创建易于测试的定义。 This is easily generalized to sets of any kind - simply store references to both of the parent sets, and when asked whether a number is in both sets, just check with both of the parent sets. 这很容易推广到任何类型的集合 - 只是存储对两个父集合的引用,当被问及两个集合中是否存在数字时,只需检查两个父集合。

However, an arbitrary mathematical expression is not easy to iterate over at all. 但是,任意数学表达式都不容易迭代。 For iteration, the simplest method is the iterate over set B (which can be done by changing only the bits that aren't constrained by the set), and allow set A to constrain the result. 对于迭代,最简单的方法是迭代集合B(可以通过仅更改不受集合约束的位来完成),并允许集合A约束结果。 If A uses > or >=, then iterate down (from the maximum number) and halt on false for maximum efficiency; 如果A使用>或> =,则向下迭代(从最大数字开始)并在false时停止以获得最大效率; if A uses < or <=, then iterate up (from the minimum number) and halt on false. 如果A使用<或<=,则迭代(从最小数字开始)并停止为false。 If A uses ==, then there's only one number to check, and if A uses !=, then either direction is fine (but you can't halt on false). 如果A使用==,那么只有一个要检查的数字,如果A使用!=,那么任何一个方向都可以(但你不能停止为假)。

Note that a bitwise set can behave like an indexable array of numbers - for example, the bitwise set defined by 11XX00XX can be treated as an array with indexes ranging from 0000 to 1111, with the bits of the index being fit into the corresponding slots. 请注意,按位集的行为类似于可索引的数字数组 - 例如,11XX00XX定义的按位集可视为索引范围为0000到1111的数组,索引的位适合相应的插槽。 This makes it easy to iterate up or down over the set. 这使得在集合上向上或向下迭代变得容易。 Set A can be indexed in a similar way, but since it can easily be an infinite set (unless constrained by your machine's int value, though it doesn't have to be, ie BigInteger), it isn't the safest thing to iterate over. 集合A可以以类似的方式编制索引,但由于它可以很容易地成为无限集(除非受机器的int值约束,尽管它不必如此,即BigInteger),迭代不是最安全的事情。过度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM