简体   繁体   English

基于汉明重量的索引

[英]Hamming weight based indexing

Assume we have a integer of bitsize n=4; 假设我们有一个整数的bitsize n=4;
The problem I am describing is how you would go about indexing a number to an array position based on the Hamming weight and its value knowing the bitsize . 我所描述的问题是如何根据汉明权重及其知道bitsize值将数字索引到数组位置。 Eg An array with 16 elements for bitsize 4 would/could look like this: 例如,具有16个用于bitsize 4的元素的数组将/可能如下所示:

|0|1|2|4|8|3|5|6|9|10|12|7|11|13|14|15|

Where elements are grouped by their Hamming weight(necessary) and sorted based on size(not necessary). 元素按其汉明重量(必要)分组,并根据大小排序(不必要)。 Sorting is not necessary as long as you can take eg 3(0011) do some operations and get back index 5, 5(0101) -> 6 etc. 只要您可以采取例如3(0011)进行一些操作并返回索引5,5(0101) - > 6等,则不需要排序。

All combinations of n bits will be present and there will be no duplication. 将出现n位的所有组合,并且不会重复。 Eg bitsize of 3 would have the array: 例如, 3 bitsize将具有数组:

|0|1|2|4|3|5|6|7|

I would preferably have a solution without loops. 我最好有一个没有循环的解决方案。 Or any papers that discuss simillar solutions. 或任何讨论simillar解决方案的论文。 Or finally just throw out any ides on how you could go about doing that. 或者最后只是抛出任何关于如何做到这一点的想法。

Note that you can enumerate numbers (in counting order) with the same hamming weight using the following functions: 请注意,您可以使用以下函数枚举具有相同汉明重量的数字(按计数顺序):

int next(int n) { // get the next one with same # of bits set
  int lo = n & -n;       // lowest one bit
  int lz = (n + lo) & ~n;      // lowest zero bit above lo
  n |= lz;                     // add lz to the set
  n &= ~(lz - 1);              // reset bits below lz
  n |= (lz / lo / 2) - 1;      // put back right number of bits at end
  return n;
}

int prev(int n) { // get the prev one with same # of bits set
   int y = ~n;
   y &= -y; // lowest zero bit
   n &= ~(y-1); // reset all bits below y
   int z = n & -n; // lowest set bit
   n &= ~z;        // clear z bit
   n |= (z - z / (2*y)); // add requried number of bits below z
   return n;
 }

As an example, repititive application of prev() on x = 5678: 例如,在x = 5678上重复应用prev():

0: 00000001011000101110 (5678)
1: 00000001011000101101 (5677)
2: 00000001011000101011 (5675)
3: 00000001011000100111 (5671)
4: 00000001011000011110 (5662)
5: 00000001011000011101 (5661)
6: 00000001011000011011 (5659)
.....

Hence theoretically you can compute the index of a number by repititive application of this. 因此,从理论上讲,您可以通过重复应用来计算数字的索引。 However this can take very long. 然而,这可能需要很长时间。 The better approach would be to "jump" over some combinations. 更好的方法是“跳过”某些组合。

There are 2 rules: 有两个规则:

 1. if the number starts with: ..XXX10..01..1 we can replace it by ..XXX0..01..1
adding corresponding number of combinations
 2. if the number starts with: ..XXX1..10..0 again replace it by XXX0..01..1 with corresponding number of combinations 

The following algorithm computes the index of a number among the numbers with the same Hamming weight (i did not bother about fast implementation of binomial): 以下算法计算具有相同汉明权重的数字中的数字的索引(我不打扰快速实现二项式):

#define LOG2(x) (__builtin_ffs(x)-1)

int C(int n, int k) { // simple implementation of binomial
 int c = n - k; 
 if(k < c) 
   std::swap(k,c);
 if(c == 0)
  return 1;
 if(k == n-1) 
  return n;
 int b = k+1;
 for(int i = k+2; i <= n; i++) 
    b = b*i;
 for(int i = 2; i <= c; i++)
   b = b / i;
 return b;
}
int position_jumping(unsigned x) {
   int index = 0;
  while(1) {

    if(x & 1) { // rule 1: x is of the form: ..XXX10..01..1
        unsigned y = ~x;
        unsigned lo = y & -y; // lowest zero bit
        unsigned xz = x & ~(lo-1); // reset all bits below lo
        unsigned lz = xz & -xz; // lowest one bit after lo
        if(lz == 0) // we are in the first position!
           return index;

        int nn = LOG2(lz), kk = LOG2(lo)+1;       
        index += C(nn, kk); //   C(n-1,k) where n = log lz and k = log lo + 1

        x &= ~lz; //! clear lz bit
        x |= lo; //! add lo

    } else { // rule 2: x is of the form: ..XXX1..10..0
        int lo = x & -x; // lowest set bit
        int lz = (x + lo) & ~x;  // lowest zero bit above lo  
        x &= ~(lz-1); // clear all bits below lz
        int sh = lz / lo;

        if(lz == 0) // special case meaning that lo is in the last position
            sh=((1<<31) / lo)*2;
        x |= sh-1;

        int nn = LOG2(lz), kk = LOG2(sh);
        if(nn == 0)
           nn = 32;
        index += C(nn, kk);
    }
    std::cout << "x: " << std::bitset<20>(x).to_string() << "; pos: " << index << "\n";
  }
 }

For example, given the number x=5678 the algorithm will compute its index in just 4 iterations: 例如,给定数字x = 5678,算法将仅在4次迭代中计算其索引:

  x: 00000001011000100111; pos: 4
  x: 00000001011000001111; pos: 9
  x: 00000001010000011111; pos: 135
  x: 00000001000000111111; pos: 345
  x: 00000000000001111111; pos: 1137

Note that 1137 is the position of 5678 within the group of numbers with the same Hamming weight. 注意,1137是具有相同汉明权重的数字组中的5678的位置。 Hence you would have to shift this index accordingly to account for all the numbers with smaller Hamming weights 因此,您必须相应地移动此索引以考虑具有较小汉明权重的所有数字

Here is a concept work, just to get the discussion started. 这是一个概念工作,只是为了开始讨论。
The step one is hardest - solved using approximation to calculate factorials. 第一步是最困难的 - 使用近似值来计算因子。
Anymore bright ideas? 还有更好的想法吗?

Ideone link Ideone链接

#include <stdio.h>
#include <math.h>

//gamma function using Lanczos approximation formula
//output result in log base e
//use exp() to convert back
//has a nice side effect: can store large values in small [power of e] form
double logGamma(double x)
{
    double tmp = (x-0.5) * log(x+4.5) - (x+4.5);
    double ser = 1.0 + 76.18009173     / (x+0) - 86.50532033    / (x+1)
                     + 24.01409822     / (x+2) -  1.231739516   / (x+3)
                     +  0.00120858003  / (x+4) -  0.00000536382 / (x+5);
    return tmp + log(ser * sqrt(2*M_PI) );  
}

//result from logGamma() are actually (n-1)!
double combination(int n, int r)
{
    return exp(logGamma(n+1)-( logGamma(r+1) + logGamma(n-r+1) ));
}

//primitive hamming weight counter
int hWeight(int x)
{
    int count, y;
    for (count=0, y=x; y; count++)
        y &= y-1; 
    return count;
}

//-------------------------------------------------------------------------------------
//recursively find the previous group's "hamming weight member count" and sum them
int rCummGroupCount(int bitsize, int hw)
{
    if (hw <= 0 || hw == bitsize) 
        return 1;
    else
        return round(combination(bitsize, hw)) + rCummGroupCount(bitsize,hw-1);
}
//-------------------------------------------------------------------------------------

int main(int argc, char* argv[])
{
    int bitsize = 4, integer = 14;
    int hw = hWeight(integer);
    int groupStartIndex = rCummGroupCount(bitsize,hw-1);
    printf("bitsize: %d\n", bitsize);
    printf("integer: %d  hamming weight: %d\n", integer, hw);
    printf("group start index: %d\n", groupStartIndex);
}

output: 输出:

bitsize: 4 bitsize:4
integer: 14 hamming weight: 3 整数:14汉明重量:3
group start index: 11 小组起始指数:11

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM