简体   繁体   English

Java 中未知长度的字节数组:第二部分

[英]Byte array of unknown length in Java: Part II

Similar to "Byte array of unknown length in java" I need to be able to write an unknown number of bytes from a data source into a byte[] array.类似于“Java 中未知长度的字节数组”,我需要能够将未知数量的字节从数据源写入字节 [] 数组。 However I need the ability to read from bytes that were stored earlier, for a compression algorithm, so ByteArrayOutputStream doesn't work for me.但是,我需要能够从较早存储的字节中读取压缩算法,因此ByteArrayOutputStream对我不起作用。

Right now I have a scheme where I allocate ByteBuffers of fixed size N, adding a new one as I reach N, 2N, 3N bytes etc. After the data is exhausted I dump all buffers into an array of now-known size.现在我有一个分配固定大小 N 的 ByteBuffers 的方案,当我达到 N、2N、3N 字节等时添加一个新的。数据用完后,我将所有缓冲区转储到一个现在已知大小的数组中。

Is there a better way to do this?有一个更好的方法吗? Having fixed-size buffers reduces the flexibility of the compression algorithm.具有固定大小的缓冲区会降低压缩算法的灵活性。

What about using a circular byte buffer? 使用循环字节缓冲区怎么办? It has the possibility to grow dynamically and is efficient. 它有可能动态增长并且高效。

There's an implementation here: http://ostermiller.org/utils/CircularByteBuffer.java.html 这里有一个实现: http : //ostermiller.org/utils/CircularByteBuffer.java.html

Why don't you subclass ByteArrayOutputStream ? 为什么不将ByteArrayOutputStream子类化? That way your subclass has access to the protected buf and count fields, and you can add methods to your class to manipulate them directly. 这样,您的子类就可以访问受保护的bufcount字段,并且可以向类中添加方法以直接对其进行操作。

As Chris answered the CircularByteBuffer api is the way to go. 正如克里斯回答的那样, CircularByteBuffer API是必经之路。 Luckily it is in central maven repo now. 幸运的是,它现在位于中央行家仓库中。 Quoting a snippet from this link , it is as simple as follows: 从此链接引用一个片段,它很简单,如下所示:

Single Threaded Example of a Circular Buffer 循环缓冲区的单线程示例

// buffer all data in a circular buffer of infinite size
CircularByteBuffer cbb = new CircularByteBuffer(CircularByteBuffer.INFINITE_SIZE);
class1.putDataOnOutputStream(cbb.getOutputStream());
class2.processDataFromInputStream(cbb.getInputStream());

Advantages are: 优点是:

  • One CircularBuffer class rather than two pipe classes. 一个CircularBuffer类,而不是两个管道类。
  • It is easier to convert between the "buffer all data" and "extra threads" approaches. 在“缓冲所有数据”和“额外线程”方法之间进行转换更容易。
  • You can change the buffer size rather than relying on the hard-coded 1k of buffer in the pipes. 您可以更改缓冲区大小,而不是依赖管道中硬编码的1k缓冲区。

Finally we are free of memory concerns and pipes API 最后,我们摆脱了内存问题和管道API的困扰

The expense of the ByteArrayOutputStream is the resizing of the underlying array. ByteArrayOutputStream的开销是调整基础数组的大小。 Your fixed block routine eliminates much of that. 您的固定程序块消除了很多麻烦。 If the resizing isn't expensive enough to you to matter (ie in your testing the ByteArrayOutputStream is "fast enough", and doesn't provide undo memory pressure), then perhaps subclassing ByteArrayOutputStream, as suggested by vanza, would work for you. 如果调整大小对您来说并不重要(例如,在测试ByteArrayOutputStream时“足够快”,并且不提供撤消内存压力),那么按照vanza的建议,将ByteArrayOutputStream子类化将对您有用。

I don't know your compression algorithm, so I can't say why your list of blocks is making it less flexible, or even why the compression algorithm would even KNOW about the blocks. 我不知道您的压缩算法,所以我不能说为什么您的块列表使灵活性降低,或者甚至为什么压缩算法甚至会知道这些块。 But since the blocks can by dynamic, you may be able to tune the block size as appropriate to better support the variety of the compression algorithm you're using. 但是由于块可以动态变化,因此您可以适当地调整块大小,以更好地支持您使用的各种压缩算法。

If the compression algorithm can work on a "stream" (ie fixed size chunks of data), then the block size should matter as you could hide all of those details from the implementation. 如果压缩算法可以在“流”(即固定大小的数据块)上工作,则块大小应该很重要,因为您可以从实现中隐藏所有这些细节。 The perfect world is if the compression algorithm wants its data in chunks that match the size of the blocks your allocating, that way you wouldn't have to copy data to feed the compressor. 理想的情况是,如果压缩算法希望以与您分配的块的大小匹配的块为单位的数据,那么您就不必复制数据来馈送压缩器。

While you can certainly use an ArrayList for this, you pretty much look at an memory overhead of 4-8times - assuming that bytes aren't newly allocated but share one global instance (since this is true for integers I assume it works for Bytes as well) - and you lose all cache locality. 虽然您当然可以为此使用ArrayList,但是您几乎可以看到4到8倍的内存开销-假设字节不是新分配的,而是共享一个全局实例(因为对于整数,这是正确的,所以我认为它适用于Bytes为很好)-您将丢失所有缓存位置。

So while you could subclass ByteArrayOutputStream, but even there you get overhead (the methods are synchronized) that you don't need. 因此,虽然您可以继承ByteArrayOutputStream的子类,但是即使在那儿,您也会得到不需要的开销(方法已同步)。 So I personally would just roll out my own class that grows dynamically when you write to it. 因此,我个人将推出自己的类,该类在您编写时会动态增长。 Less efficient than your current method, but simple and we all know the part with the amortized costs - otherwise you can obviously use your solution as well. 效率不如您当前的方法,但很简单,而且我们都知道该零件的摊销成本-否则您显然也可以使用您的解决方案。 As long as you wrap the solution in a clean interface you'll hide the complexity and still get the good performance 只要将解决方案包装在一个干净的界面中,您就可以隐藏复杂性并仍然获得良好的性能

Or otherwise said: No you pretty much can't do this more efficiently than what you're already doing and every built-in java Collection should perform worse for one reason or the other. 或以其他方式说:不,您几乎不能比已经做的事更有效率,并且每个内置的Java Collection都应出于一个或另一个原因而表现较差。

For simplicity, you might consider using java.util.ArrayList : 为简单起见,您可以考虑使用java.util.ArrayList

ArrayList<Byte> a = new ArrayList<Byte>();
a.add(value1);
a.add(value2);
...
byte value = a.get(0);

Java 1.5 and higher will provide automatic boxing and unboxing between the byte and Byte types. Java 1.5及更高版本将在byteByte类型之间提供自动装箱和拆箱。 Performance may be slightly worse than ByteArrayOutputStream , but it is easy to read and understand. 性能可能比ByteArrayOutputStream 稍差 ,但易于阅读和理解。

I ended up writing my own method which uses a temporary fixed buffer array and appends it to your final byte array each time after the fixed buffer is filled.我最终编写了自己的方法,该方法使用临时固定缓冲区数组,并在固定缓冲区填满后每次将其附加到最终字节数组。 It will continue to overwrite the fixed buffer array and append to your final array until all bytes are read and copied.它将继续覆盖固定缓冲区数组并附加到最终数组,直到读取并复制所有字节。 At the end, if the temporaryArray is not filled, it will copy the read bytes from that array into the final array.最后,如果 temporaryArray 没有被填充,它会将读取的字节从该数组复制到最终数组中。 My code was written for Android, so you may need to use a similar method to ArrayUtils.concatByteArrays (com.google.gms.common.util.ArrayUtils)我的代码是为 Android 编写的,因此您可能需要使用与ArrayUtils.concatByteArrays (com.google.gms.common.util.ArrayUtils)类似的方法

My code has the temporary array size set to 100 ( growBufferSize ) but it might be better to set above 500 or even 1000 or whatever performs best on the environment you use.我的代码将临时数组大小设置为 100 ( growBufferSize ),但最好设置为高于 500 甚至 1000 或任何在您使用的环境中表现最佳的值。 The final result will be stored in the bytesfinal array.最终结果将存储在bytesfinal数组中。

This method should reduce memory usage to prevent OutOfMemoryError s.此方法应减少内存使用量以防止OutOfMemoryError s。 Since it is using mainly primitives, memory should be reduced.由于它主要使用基元,因此应该减少内存。

final int growBufferSize = 100;
byte[] fixedBuffer = new byte[growBufferSize];
byte[] bytesfinal = new byte[0];

int fixedBufferIndex=0;
while (zin.available()>0){
    fixedBuffer[fixedBufferIndex] = (byte)zin.read();
    if (fixedBufferIndex == growBufferSize-1){
        bytesfinal = ArrayUtils.concatByteArrays(bytesfinal,fixedBuffer);
        fixedBufferIndex = -1;
    }

    fixedBufferIndex++;
}

if (fixedBufferIndex!=0){
    byte[] lastBytes = new byte[fixedBufferIndex];
    //copy last read bytes to new array
    for (int i = 0; i<fixedBufferIndex; i++){
        lastBytes[i]=fixedBuffer[i];
    }

    //add last bits of data
    bytesfinal = ArrayUtils.concatByteArrays(bytesfinal,lastBytes);
    lastBytes = null;
    fixedBuffer = null;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM