简体   繁体   English

为什么不使用更具大小确定性的类型来实现BitSet?

[英]Why not implement BitSet with a more size-deterministic type?

The Java reference here indicates that boolean types, while represented with a "bit" of information, do not have a precisely defined size. 此处的Java参考指出,布尔类型虽然用信息的“位”表示,但没有精确定义的大小。 In contrast, other types seem to suggest that the size is defined. 相反,其他类型似乎暗示已定义大小。 For example, an int is 32-bits, end of story. 例如,一个int是故事末尾的32位。

When we look at the spec for a BitSet , we can see that it is composed of boolean values. 当我们查看BitSet的规范时,我们可以看到它由布尔值组成。 By the reference above, this seems to suggest that the "size" of a BitSet is undefined - it's composed of boolean values, after all. 通过上面的引用,这似乎表明BitSet的“大小”是不确定的-毕竟它是由布尔值组成的。 And sure enough, the documentation specifies: 确实,该文档指定:

Note that the size is related to the implementation of a bit set, so it may change with implementation. 请注意,大小与位集的实现有关,因此它可能随实现而变化。

So my question is, why not implement a BitSet using another datatype that is precisely defined? 所以我的问题是,为什么不使用精确定义的另一个数据类型实现BitSet For example, if we use a byte, we could guarantee a size of 8-bits, and we wouldn't have the fuzzy feeling that the size may not be what we think it is. 例如,如果我们使用一个字节,则可以保证8位的大小,并且不会模糊地认为该大小可能不是我们认为的大小。 It's true that the size would have to be divisible by 8, but at least it seems more size-deterministic this way. 的确,必须将大小除以8,但是至少这样看来,大小更具确定性。

If we have a system that absolutely cannot exceed a certain memory capacity, it seems useful to have a BitSet implementation that is precise in terms of size. 如果我们有一个绝对不能超过某个内存容量的系统,那么拥有一个大小精确的BitSet实现似乎很有用。

I think that you're getting conceptually stuck by the fact that the method signatures use booleans. 我认为方法签名使用布尔值会在概念上使您陷入困境。

The easiest way to think about a single bit is off/on, so a boolean true/false is a convenient way to model it. 考虑单个位的最简单方法是关闭/打开,因此布尔值true / false是对其进行建模的便捷方法。 Another thing entirely is the BitSet internal storage, which if you have a look at the source code , is using a long array and using bitmasks to twiddle individual bits. 完全另一件事是BitSet内部存储器,如果您看一下源代码 ,它将使用一个long数组并使用位掩码来缠绕各个位。

Accordingly, the size of the BitSet is tied pretty closely to the number of bits in use. 因此, BitSet的大小与使用的位数密切相关。

Part of the point of BitSet is that its length is conceptually infinite -- that we can manipulate arbitrarily many bits with it. BitSet部分要点是它的长度在概念上是无限的-我们可以用它任意操纵许多位。 It's not the memory consumption we care about so much as the semantics, and size is only an indication of memory consumption. 这不是语义上我们关心的内存消耗,而size只是内存消耗的指示。

A BitSet is not necessarily composed of booleans but would convert the bits to booleans for ease of use (instead of having to check against 0 or 1). 一个BitSet不一定由布尔值组成,但是为了易于使用,它会将这些位转换为布尔值(而不是必须检查0或1)。

Besides that the implementation would most likely use some datatype but depending on the architecture the bits might be stored using a bunch of 8-, 16-, 32- or 64-bit integers (or something else). 除此之外,该实现很可能会使用某些数据类型,但根据体系结构,这些位可能会使用一堆8位,16位,32位或64位整数(或其他方式)进行存储。 In most systems memory constraints are not that hard and thus a bitset with a logical size of say 5 having a real size of 1 or 8 bytes isn't that critical. 在大多数系统中,内存约束并不那么困难,因此逻辑大小为5(实际大小为1或8个字节)的位集并不是那么关键。

True, you could implement a bit set using bytes only but there might be reasons to adhere to the platform's memory alignment (which might be more than one byte). 没错,您可以仅使用字节来实现位设置,但是可能有理由坚持使用平台的内存对齐方式(可能超过一个字节)。

You let it sound like a BitField needs to be an array of actual boolean s, this is not so; 听起来好像BitField 需要是实际boolean s的数组,事实并非如此。 You can for example look up the current implementation in the source code of your JDK. 例如,您可以在JDK的源代码中查找当前的实现。 Here's a snipplet: 这是一个片段:

/**
 * The internal field corresponding to the serialField "bits".
 */
private long[] words;

So in this case an array of long s is used and the bits are accessed through bitmasking and shifts. 因此,在这种情况下,将使用long数组,并通过位掩码和移位访问这些位。

Unlike most primitives, byte sizes of Java objects are not well defined and are implementation-dependent or may even change during an application's run time due to JIT compilation and different tricks the JVM uses internally. 与大多数原语不同,Java对象的字节大小没有很好地定义并且取决于实现,或者由于JIT编译和JVM内部使用的各种技巧而在应用程序运行时甚至可能改变。 The size of a boolean has changed even between Sun JVM releases (4 vs 1 bytes), and if I'm not mistaken, there was even a time when a single boolean would take 4 bytes and an array of N booleans would take about N*1 bytes (or perhaps it was the byte type?). 即使在Sun JVM发行版之间, boolean的大小也发生了变化(4个字节对1个字节),并且如果我没有记错的话,甚至有一次单个boolean将占用4个字节,而N个booleans数组将占用大约N个时间。 * 1个字节(或者也许是byte类型?)。 Anyway, the logical size of a variable or its information capacity may be completely different from the physical memory allocated by JVM. 无论如何,变量的逻辑大小或其信息容量可能与JVM分配的物理内存完全不同。

BitSet consists of boolean values only conceptually and the implementation does not need to follow the logical layout. BitSet仅在概念上由布尔值组成,并且实现不需要遵循逻辑布局。 Indeed, most implementations will use a byte array for BitSet and use approximately only one bit for each value (but there is some slack in order to allow it to grow and some additional housekeeping data). 确实,大多数实现将对BitSet使用字节数组,并且对每个值仅使用一个位(但是为了使其增长和一些其他内部管理数据,存在一些松弛)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM