简体   繁体   中英

Byte array of unknown length in Java: Part II

Similar to "Byte array of unknown length in java" I need to be able to write an unknown number of bytes from a data source into a byte[] array. However I need the ability to read from bytes that were stored earlier, for a compression algorithm, so ByteArrayOutputStream doesn't work for me.

Right now I have a scheme where I allocate ByteBuffers of fixed size N, adding a new one as I reach N, 2N, 3N bytes etc. After the data is exhausted I dump all buffers into an array of now-known size.

Is there a better way to do this? Having fixed-size buffers reduces the flexibility of the compression algorithm.

What about using a circular byte buffer? It has the possibility to grow dynamically and is efficient.

There's an implementation here: http://ostermiller.org/utils/CircularByteBuffer.java.html

Why don't you subclass ByteArrayOutputStream ? That way your subclass has access to the protected buf and count fields, and you can add methods to your class to manipulate them directly.

As Chris answered the CircularByteBuffer api is the way to go. Luckily it is in central maven repo now. Quoting a snippet from this link , it is as simple as follows:

Single Threaded Example of a Circular Buffer

// buffer all data in a circular buffer of infinite size
CircularByteBuffer cbb = new CircularByteBuffer(CircularByteBuffer.INFINITE_SIZE);
class1.putDataOnOutputStream(cbb.getOutputStream());
class2.processDataFromInputStream(cbb.getInputStream());

Advantages are:

  • One CircularBuffer class rather than two pipe classes.
  • It is easier to convert between the "buffer all data" and "extra threads" approaches.
  • You can change the buffer size rather than relying on the hard-coded 1k of buffer in the pipes.

Finally we are free of memory concerns and pipes API

The expense of the ByteArrayOutputStream is the resizing of the underlying array. Your fixed block routine eliminates much of that. If the resizing isn't expensive enough to you to matter (ie in your testing the ByteArrayOutputStream is "fast enough", and doesn't provide undo memory pressure), then perhaps subclassing ByteArrayOutputStream, as suggested by vanza, would work for you.

I don't know your compression algorithm, so I can't say why your list of blocks is making it less flexible, or even why the compression algorithm would even KNOW about the blocks. But since the blocks can by dynamic, you may be able to tune the block size as appropriate to better support the variety of the compression algorithm you're using.

If the compression algorithm can work on a "stream" (ie fixed size chunks of data), then the block size should matter as you could hide all of those details from the implementation. The perfect world is if the compression algorithm wants its data in chunks that match the size of the blocks your allocating, that way you wouldn't have to copy data to feed the compressor.

While you can certainly use an ArrayList for this, you pretty much look at an memory overhead of 4-8times - assuming that bytes aren't newly allocated but share one global instance (since this is true for integers I assume it works for Bytes as well) - and you lose all cache locality.

So while you could subclass ByteArrayOutputStream, but even there you get overhead (the methods are synchronized) that you don't need. So I personally would just roll out my own class that grows dynamically when you write to it. Less efficient than your current method, but simple and we all know the part with the amortized costs - otherwise you can obviously use your solution as well. As long as you wrap the solution in a clean interface you'll hide the complexity and still get the good performance

Or otherwise said: No you pretty much can't do this more efficiently than what you're already doing and every built-in java Collection should perform worse for one reason or the other.

For simplicity, you might consider using java.util.ArrayList :

ArrayList<Byte> a = new ArrayList<Byte>();
a.add(value1);
a.add(value2);
...
byte value = a.get(0);

Java 1.5 and higher will provide automatic boxing and unboxing between the byte and Byte types. Performance may be slightly worse than ByteArrayOutputStream , but it is easy to read and understand.

I ended up writing my own method which uses a temporary fixed buffer array and appends it to your final byte array each time after the fixed buffer is filled. It will continue to overwrite the fixed buffer array and append to your final array until all bytes are read and copied. At the end, if the temporaryArray is not filled, it will copy the read bytes from that array into the final array. My code was written for Android, so you may need to use a similar method to ArrayUtils.concatByteArrays (com.google.gms.common.util.ArrayUtils)

My code has the temporary array size set to 100 ( growBufferSize ) but it might be better to set above 500 or even 1000 or whatever performs best on the environment you use. The final result will be stored in the bytesfinal array.

This method should reduce memory usage to prevent OutOfMemoryError s. Since it is using mainly primitives, memory should be reduced.

final int growBufferSize = 100;
byte[] fixedBuffer = new byte[growBufferSize];
byte[] bytesfinal = new byte[0];

int fixedBufferIndex=0;
while (zin.available()>0){
    fixedBuffer[fixedBufferIndex] = (byte)zin.read();
    if (fixedBufferIndex == growBufferSize-1){
        bytesfinal = ArrayUtils.concatByteArrays(bytesfinal,fixedBuffer);
        fixedBufferIndex = -1;
    }

    fixedBufferIndex++;
}

if (fixedBufferIndex!=0){
    byte[] lastBytes = new byte[fixedBufferIndex];
    //copy last read bytes to new array
    for (int i = 0; i<fixedBufferIndex; i++){
        lastBytes[i]=fixedBuffer[i];
    }

    //add last bits of data
    bytesfinal = ArrayUtils.concatByteArrays(bytesfinal,lastBytes);
    lastBytes = null;
    fixedBuffer = null;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM