简体   繁体   English

创建按位集排序的序列

[英]Create a sequence which is ordered by bits set

I'm looking for a reversible function unsigned f(unsigned) for which the number of bits set in f(i) increases with i , or at least does not decrease. 我正在寻找一个可逆函数unsigned f(unsigned) ,其中f(i)设置的位数随i增加,或者至少不减小。 Obviously, f(0) has to be 0 then, and f(~0) must come last. 显然, f(0)必须为0,而f(〜0)必须是最后的。 In between there's more flexibility. 在两者之间有更多的灵活性。 After f(0), the next 32* values must be 1U<<0 to 1U<<31 , but I don't care a lot about the order (they all have 1 bit set). 在f(0)之后,接下来的32 *值必须是1U<<01U<<31 ,但我不太关心顺序(它们都有1位设置)。

I'd like an algorithm which doesn't need to calculate f(0)...f(i-1) in order to calculate f(i) , and a complete table is also unworkable. 我想要一种算法,它不需要计算f(0)...f(i-1)来计算f(i) ,而完整的表也是不可行的。

This is similar to Gray codes, but I can't see a way to reuse that algorithm. 这类似于格雷码,但我看不到重用该算法的方法。 I'm trying to use this to label a large data set, and prioritize the order in which I search them. 我正在尝试使用它来标记大型数据集,并优先考虑我搜索它们的顺序。 The idea is that I have a key C , and I'll check labels C ^ f(i) . 我的想法是我有一个密钥C ,我会检查标签C ^ f(i) Low values of i should give me labels similar to C , ie differing in only a few bits. i低值应该给我类似于C标签,即只有几位不同。

[*] Bonus points for not assuming that unsigned has 32 bits. [*]奖励点不假设unsigned有32位。

[example] A valid initial sequence: [示例]有效的初始序列:

0, 1, 2, 4, 16, 8 ... // 16 and 8 both have one bit set, so they compare equal

An invalid initial sequence: 初始序列无效:

0, 1, 2, 3, 4 ... // 3 has two bits set, so it cannot precede 4 or 2147483648.

Ok, seems like I have a reasonable answer. 好吧,好像我有一个合理的答案。 First let's define binom(n,k) as the number of ways in which we can set k out of n bits. 首先,让我们将binom(n,k)定义为我们可以在n位中设置k的方式。 That's the classic Pascal triangle: 那是经典的Pascal三角形:

1  1
1  2  1
1  3  3  1
1  4  6  4  1
1  5 10 10  5  1
1  6 15 20 15  6  1
1  7 21 35 35 21  7  1
1  8 28 56 70 56 28  8  1
...

Easily calculated and cached. 轻松计算和缓存。 Note that the sum of each line is 1<<lineNumber . 注意,每行的总和是1<<lineNumber

The next thing we'll need is the partial_sum of that triangle: 接下来我们需要的是该三角形的partial_sum

1  2
1  3  4
1  4  7  8
1  5 11 15 16
1  6 16 26 31  32
1  7 22 42 57  63  64
1  8 29 64 99  120 127 128
1  9 37 93 163 219 247 255 256 
...

Again, this table can be created by summing two values from the previous line, except that the new entry on each line is now 1<<line instead of 1 . 同样,可以通过对前一行中的两个值求和来创建此表,除了每行上的新条目现在是1<<line而不是1

Let's use these tables above to construct f(x) for an 8 bits number (it trivially generalizes to any number of bits). 让我们使用上面的这些表来构造一个8位数的f(x) (它通常推广到任意数量的位)。 f(0) still has to be 0. Looking up the 8th row in the first triangle, we see that next 8 entries are f(1) to f(9) , all with one bit set. f(0)仍然必须为0.查看第一个三角形中的第8行,我们看到接下来的8个条目是f(1)f(9) ,都设置了一个位。 The next 28 entries (7+6+5+4+3+2+1) all have 2 bits set, so that's f(10) to f(37). 接下来的28个条目(7 + 6 + 5 + 4 + 3 + 2 + 1)都设置了2位,因此f(10)到f(37)。 The next 56 entries, f(38) to f(93) have 3 bits, and there are 70 entries with 4 bits set. 接下来的56个条目f(38)到f(93)有3个比特,并且有70个条目设置了4个比特。 From symmetry we can see that they're centered around f(128), in particular they're f(94) to f(163). 从对称性我们可以看出它们以f(128)为中心,特别是它们是f(94)到f(163)。 And obviously, the only number with 8 bits set sorts last, as f(255). 显然,8位设置的唯一数字最后排序,如f(255)。

So, with these tables we can quickly determine how many bits must be set in f(i). 因此,通过这些表,我们可以快速确定必须在f(i)中设置多少位。 Just do a binary search in the last row of your table. 只需在表格的最后一行进行二分查找即可。 But that doesn't answer exactly which bits are set. 但这并不能确切地回答设置哪些位。 For that we need the previous rows. 为此我们需要前面的行。

The reason that each value in the table can be created from the previous line is simple. 可以从上一行创建表中的每个值的原因很简单。 binom(n,k) == binom(k, n-1) + binom(k-1, n-1). binom(n,k)== binom(k,n-1)+ binom(k-1,n-1)。 There are two sorts of numbers with k bits set: Those that start with a 0... and numbers which start with 1... . 设置k位的数字有两种:以0...开头的数字和以1...开头的数字1... In the first case, the next n-1 bits must contain those k bits, in the second case the next n-1 bits must contain only k-1 bits. 在第一种情况下,下一个n-1位必须包含那些k位,在第二种情况下,下一个n-1位必须仅包含k-1位。 Special cases are of course 0 out of n and n out of n . 特殊情况当然0 out of nn out of n

This same stucture can be used to quickly tell us what f(16) must be. 这个结构可以用来快速告诉我们f(16)必须是什么。 We already had established that it must contain 2 bits set, as it falls in the range f(10) - f(37) . 我们已经确定它必须包含2位设置,因为它落在f(10) - f(37)的范围内。 In particular, it's number 6 with 2 bits set (starting as usual with 0). 特别是,设置为2位的是6号(通常以0开始)。 It's useful to define this as an offset in a range as we'll try to shrink the length this range from 28 down to 1. 将此值定义为范围内的偏移量非常有用,因为我们将尝试将此范围的长度从28缩小到1。

We now subdivide that range into 21 values which start with a zero and 7 which start a one. 我们现在将该范围细分为21个值,以零开始,7开始为1。 Since 6 < 21, we know that the first digit is a zero. 从6 <21开始,我们知道第一个数字是零。 Of the remaining 7 bits, still 2 need to be set, so we move up a line in the triangle and see that 15 values start with two zeroes, and 6 start with 01. Since 6 < 15, f(16) starts with 00. Going further up, 7 <= 10 so it starts with 000 . 在剩余的7位中,仍然需要设置2,所以我们在三角形中向上移动一行,看到15个值以两个零开始,6个从01开始。由于6 <15,f(16)从00开始继续往前走,7 <= 10,所以从000开始。 But 6 == 6, so it doesn't start with 0000 but 0001 . 但是6 == 6,所以它不是以0000开始,而是以0001开头。 At this point we change the start of the range, so the new offset becomes 0 (6-6) 此时我们更改范围的开始,因此新偏移变为0(6-6)

We know need can focus only on the numbers that start with 0001 and have one extra bit, which are f(16)...f(19) . 我们知道需要只关注以0001开头并且有一个额外位的数字,即f(16)...f(19) It should be obvious by know that the range is f(16)=00010001, f(17)=00010010, f(18)=00010100, f(19)=00011000 . 应该明白,范围是f(16)=00010001, f(17)=00010010, f(18)=00010100, f(19)=00011000

So, to calculate each bit, we move one row up in the triangle, compare our "remainder", add a zero or one based on the comparison possibly go left one column. 因此,为了计算每个位,我们在三角形中向上移动一行,比较我们的“余数”,根据比较添加零或一个可能左一列。 That means the computational complexity of f(x) is O(bits) , or O(log N) , and the storage needed is O(bits*bits) . 这意味着f(x)的计算复杂度是O(bits)O(log N) ,并且所需的存储是O(bits*bits)

For each given number k we know that there are binom(n, k) n -bit integers that have exactly k bits of value one. 对于每个给定的数k我们知道存在具有正好k位值1的二进制binom(n, k) n位整数。 We can now generate a lookup table of n + 1 integers that store for each k how many numbers have less one bits. 我们现在可以生成一个n + 1整数的查找表,为每个k存储多少个数字少于一个位。 This lookup table can then be used to find the number o of one bits of f(i) . 然后可以使用该查找表来找到f(i)的一位的数量o

Once we know this number we subtract the lookup table value for this number of bits from i which leaves us with the permutation index p for numbers with the given number of one bits. 一旦我们知道了这个数字,我们就从i减去这个位数的查找表值,这给我们留下了具有给定1位数的数字的置换索引p Altough I have not done research in this area I am quite sure that there exists a method for finding the pth permutation of a std::vector<bool> which is initialized with zeros and o ones in the lowest bits. 尽管我还没有在这方面做过研究,但我确信存在一种方法可以找到std::vector<bool>的pth排列,它用零和最低位的o初始化。

The reverse function 反向功能

Again the lookup table comes in handy. 查找表再次派上用场。 We can directly calculate the number of preceding numbers with less one bits by counting the one bits in the input integer and reading in the lookup table. 我们可以通过计算输入整数中的一位并在查找表中读取来直接计算少于一位的前面数字的数量。 Then you "only" need to determine the permutation index and add it to the looked up value and you are done. 然后你“只”需要确定排列索引并将其添加到查找值并完成。

Disclaimer 放弃

Of course this is only a rough outline and some parts (especially involving the permutations) might take longer than it sounds. 当然,这只是一个粗略的轮廓,某些部分(特别是涉及排列)可能需要比听起来更长的时间。

Addition 加成

You stated yourself 你说自己了

I'm trying to use this to label a large data set, and prioritize the order in which I search them. 我正在尝试使用它来标记大型数据集,并优先考虑我搜索它们的顺序。

Which sounds to me as if you would be going from the low hamming distance to the high hamming distance. 这对我来说听起来好像你会从低汉明距离到高汉明距离。 In this case it would be enough to have an incremental version which generates the next number from the previous: 在这种情况下,有一个增量版本就可以生成下一个数字了:

unsigned next(unsigned previous)
{
    if(std::next_permutation(previous))
        return previous;
    else
        return (1 << (1 + countOneBits(previous))) - 1;
}

Of course std::next_permutation permutation does not work this way but I think it is clear how I mean to use it. 当然std::next_permutation排列不起作用,但我认为很明显我的意思是使用它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM