简体   繁体   English

Java多位/紧凑小整数数组

[英]Java multi-bit / compact small integer array

I am working on implementing some bloom filter variants, and a very useful data structure for this would be a compact multi-bit array;我正在努力实现一些布隆过滤器变体,一个非常有用的数据结构是一个紧凑的多位数组; that is, an array where each element is a compact integer of around 4 bits.也就是说,一个数组,其中每个元素都是一个大约 4 位的紧凑整数。

Space efficiency is of the utmost importance here, so while a plain integer array would give me the functionality I want, it would be bulkier than necessary.空间效率在这里是最重要的,所以虽然一个普通的整数数组可以提供我想要的功能,但它会比必要的更笨重。

Before I try to implement this functionality myself with bit arithmetic, I was wondering if anyone knows of a library out there that already provides such a data structure.在我尝试用位算术自己实现这个功能之前,我想知道是否有人知道已经提供这种数据结构的库。

Edit: Static size is fine.编辑:静态大小很好。 The ideal case would be an implementation that is flexible with regard to the number of bits per cell.理想的情况是在每个单元的位数方面灵活的实现方式。 That might be a bit much to hope for though (no pun intended?).不过,这可能有点令人期待(没有双关语?)。

If you aren't modifying the array after creation, java.util.BitSet does all the bit masking for you but is slow to access since you have to fetch each bit individually and do the masking yourself to re-create the int from 4 bits.如果您在创建后没有修改数组, java.util.BitSet会为您完成所有位掩码,但访问速度很慢,因为您必须单独获取每个位并自己进行掩码以从 4 位重新创建 int .

Having said that writing it yourself might be the best way to go.话虽如此,自己编写可能是最好的方法。 Doing the bit arithmetic yourself isn't that difficult since it's only 2 values per byte so decoding the high bits are (array[i] & 0xF0) >> 4 and the low bits are array[i] & 0x0F自己进行位算术并不难,因为每个字节只有 2 个值,因此解码高位是(array[i] & 0xF0) >> 4 ,低位是array[i] & 0x0F

Take a look at the compressed BitSet provided by http://code.google.com/p/javaewah/ , it allows to set bits freely and will ensure that it uses memory efficiently via compression algorithms being used.看看http://code.google.com/p/javaewah/提供的压缩 BitSet,它允许自由设置位,并确保它通过使用的压缩算法有效地使用内存。

Ie something like即像

        EWAHCompressedBitmap32 set = new EWAHCompressedBitmap32();
        set.set(0);
        set.set(1000000);

will still only occupy a few bytes, not one MB as with the Java BitSet...仍然只会占用几个字节,而不是像 Java BitSet 那样占用 1 MB ...

You should be able to map the 4-bit integer to the BitSet by multiplying the index into the BitSet accordingly您应该能够通过相应地将索引乘以 BitSet 来将 4 位整数映射到 BitSet

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM