简体   繁体   English

我在这个布隆过滤器实现中做错了什么?

[英]what am i doing wrong in this bloom filter implementation?

I have this bit table for a segmented bloom filter. 我有这个位表用于分段布隆过滤器。 Here every column is managed by a single hash function. 这里每列都由一个哈希函数管理。

unsigned char bit_table_[ROWS][COLUMNS];//bit_table now have 8*ROWS*COLUMNS bits
unsigned char bit_mask[bits_per_char] = { 0x01,0x02,0x04,0x08,
                                          0x10,0x20,0x40,0x80};

There are ROWS number of hash functions each of which handles the setting and checking of COLUMNS*8 bits. ROWS个哈希函数,每个哈希函数处理COLUMNS * 8位的设置和检查。

Elements are hashed and bit_index and bit are calculated as 元素经过哈希处理, bit_indexbit计算为

compute_indices(unsigned int hash)
{
   bit_index=hash%COLUMNS;
   bit=bit_index%8;
}

Now insetion is done as 现在就完成了摄制

for (std::size_t i = 0; i < ROWS; ++i)
      {
        hash=compute_hash(i,set_element);
        compute_indices(hash);
        bit_table_[i][bit_index ] |= bit_mask[bit]; 
      }

And the query is 而查询是

for (std::size_t i = 0; i < ROWS; ++i)
      {
     hash=compute_hash(i,set_element);
      compute_indices(hash);

      if (((bit_table_[i][bit_index])& bit_mask[bit]) != bit_mask[bit])
         {
            return false;
         }      
  }

My problem is the bloom filter gets full too soon and I suspect that i am not using the individual bits of the characters correctly. 我的问题是布隆过滤器太快就满了,我怀疑我没有正确使用字符的各个位。 For example i suppose i should have something like: 例如,我想我应该有类似的东西:

bit_table_[i][bit_index][bit]|=bit_mask[bit]; bit_table_ [I] [bit_index] [比特] | = BIT_MASK [比特];

for insertion but, since the bit_table is declared as two dimensional array i am not allowed to do this. 插入但是,由于bit_table被声明为二维数组,我不允许这样做。

What should i do to make use of the individual bits of the char array? 我该怎么做才能利用char数组的各个位?

English is my second language, so you might have trouble understanding my question. 英语是我的第二语言,所以你可能无法理解我的问题。 I would be happy to explain my points more if requested. 如果有要求,我会很乐意解释我的观点。

EDIT: compute_hash(i,set_elemnt) uses predefined salt values to compute hash value of the element to be inserted or queried. 编辑: compute_hash(i,set_elemnt)使用预定义的salt值来计算要插入或查询的元素的哈希值。

There is an error in your compute_indices method. 您的compute_indices方法中存在错误。

You are computing a column index and then apply a modulo 8 on this column index. 您正在计算列索引,然后在此列索引上应用模8。 At the end you will always use the same bit in a column. 最后,您将始终在列中使用相同的位。 For example for the column 10, you will always use the bit 2. 例如,对于列10,您将始终使用位2。

You should have : 你应该有 :

compute_indices(unsigned int hash)
{
    int bitIndex = hash % (COLUMNS * 8);
    bit_index= bitIndex / 8;
    bit = bitIndex % 8;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM