简体   繁体   English

将Java BigInteger用于大型位掩码的性能影响

[英]Performance implications of using Java BigInteger for a huge bitmask

We have an interesting challenge. 我们有一个有趣的挑战。 We have to control access to data that reside in "bins". 我们必须控制对“bin”中的数据的访问。 There will be, potentially, hundreds of thousands of "bins". 可能会有数十万个“箱子”。 Access to each bin is controlled individually but the restrictions can, and probably will, overlap. 对每个箱的访问是单独控制的,但限制可以并且可能会重叠。 We are thinking of assigning each bin a position in a bitmask (1,2,3,4, etc..). 我们正在考虑为每个bin分配位掩码中的位置(1,2,3,4等)。

Then when a user logs into the system, we look at his security attributes and determine which bins he's allowed to see. 然后,当用户登录系统时,我们会查看他的安全属性并确定允许他查看哪些垃圾箱。 With that info we construct a bitmask for this user where the "set" bits correspond to the identifier of the bins he's allowed to see. 通过该信息,我们为该用户构造了一个位掩码,其中“set”位对应于他允许看到的bin的标识符。 So if he can see bins 1, 3 and 4, his bit mask would be 1101. 因此,如果他能看到第1,3和4个箱子,他的比特掩码将是1101。

So when a user searches the data, we can look at the bin index of the returned row and see if that bit is set on his bitmask. 因此,当用户搜索数据时,我们可以查看返回行的bin索引,并查看该位是否在其位掩码上设置。 If his bitmask has that bit set we let him see that row. 如果他的位掩码设置了那个位,我们就让他看到那一行。 We are planning for the bitmask to be stored as a BigInteger in Java. 我们计划将位掩码存储为Java中的BigInteger

My question is: Assuming the index number doesn't get bigger that Integer.MAX_INT, is a BigInteger bitmask going to scale for hundreds of thousands of bit positions? 我的问题是:假设索引号没有变得大于Integer.MAX_INT,那么BigInteger位掩码是否会扩展到数十万个位? Would it take forever to run BigInteger.isBitSet(n) where n could be huge (eg 874,837)? 是否需要永远运行BigInteger.isBitSet(n) ,其中n可能很大(例如874,837)? Would it take forever to create such a BigInteger ? 是否需要永远创建这样一个BigInteger

And secondly: If you have an alternative approach, I'd love to hear it. 其次:如果你有另一种方法,我很乐意听到它。

BigInteger should be fast if you don't change it often. 如果你不经常改变它,BigInteger应该很快。

A more obvious choice would be BitSet which is designed for this sort of thing. 一个更明显的选择是BitSet ,它是为这类东西设计的。 For looking up bits, I suspect the performance is similar. 为了查找位,我怀疑性能是相似的。 For creating/modifying it would be more efficient to use a BitSet. 对于创建/修改,使用BitSet会更有效。

Note: PaulG has commented the difference is "impressive" and BitSet is faster. 注意:PaulG评论说差异是“令人印象深刻”而BitSet更快。

Java has a more convenient class for this, called BitSet . Java有一个更方便的类,称为BitSet

You do not need to check if the bit is set in a loop: you can make a mask, use a bitwise and , and see if the result is non-empty to decide on whether to grant or deny the access: 你并不需要检查,如果该位在一个循环设置:你可以做一个面膜,使用按位and ,看看结果是不空就是否授予或拒绝访问决定:

BitSet resourceAccessMask = ...
BitSet userAllowedAccessMask = ...
BitSet test = (BitSet)resourceAccessMask.clone();
test.and(userAllowedAccessMask);
if (!test.isEmpty()) {
    System.out.println("access granted");
} else {
    System.out.println("access denied");
}

We used this class in a similar situation in my prior company, and the performance was acceptable for our purposes. 我们在我以前的公司中使用过类似的情况,并且我们的目的是可以接受的。

You could define your own Java interface for this, initially using a Java BitSet to implement that interface. 您可以为此定义自己的Java接口,最初使用Java BitSet来实现该接口。

If you run into performance issues, or if you require the use of long later on, you may always provide a different implementation (eg one that uses caching or similar improvements) without changing the rest of the code. 如果遇到性能问题,或者稍后需要长时间使用,则可能总是提供不同的实现(例如,使用缓存或类似改进的实现)而不更改其余代码。 Think well about the interface you require, and choose a long index just to be sure, you can always check if it is out of bounds in the implementation later on (or simply return "no access" initially) for anything index > Integer.MAX_VALUE . 仔细考虑你需要的接口,并选择一个long索引只是为了确保,你可以随时检查它是否超出了实现范围(或者只是最初返回“无访问权限”)任何index > Integer.MAX_VALUE

Using BigInteger is not such a good idea, as the class was not written for that particular purpose, and the only way of changing it is to create a fully new copy. 使用BigInteger不是一个好主意,因为该类不是为特定目的而编写的,更改它的唯一方法是创建一个全新的副本。 It is efficient regarding memory use; 它在内存使用方面是有效的; it uses an array consisting 64 bit longs internally (at the moment, this could of course change). 它在内部使用一个包含64位长的数组(此刻,这当然可以改变)。

One thing that should be worth considering (beside using BitSet) is using different granularity. 值得考虑的一件事(除了使用BitSet之外)是使用不同的粒度。 Therefore you use a shorter bit set where each bit 'guards' multiple real bits. 因此,您使用较短的位设置,其中每个位“保护”多个实际位。 This way you would not need to have millions of bits per user in ram. 这样,ram中每个用户就不需要拥有数百万比特。

A simple way to achieve this is having a smaller bit set like n/32 and do something like this: 实现这一点的一个简单方法是使用较小的位设置,如n / 32,并执行以下操作:

boolean isSet(int n) {
    return guardingBits.isSet(n / 32) && realBits.isSet(n);
}

This gives you a good chance to avoid loading the real bits if those bits are mostly zero. 如果这些位大多为零,这为您提供了避免加载实际位的好机会。 You can modify this approach to match the expected bit-set. 您可以修改此方法以匹配预期的位集。 If you expect almost all bits are set you can use this guarding bits for storing a one if all bits it guards are set. 如果你期望几乎所有的位都被置位,你可以使用这个保护位来存储一个,如果它所设置的所有位都被设置。 So you only need to check for bits that might be zero. 所以你只需要检查可能为零的位。

Also this might be even the beginning. 这甚至可能是开始。 Depending on the usage and requirements you might want to use a B-tree or a paginated version where you only held a fraction of the big bit field in memory. 根据用途和要求,您可能希望使用B树或分页版本,其中您只保留内存中大比特字段的一小部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM