Java GZIPInputStream.read() function

Question

In the following line, when instream is a GZIPInputStream, I found that the values of c are totally random, either greater or less than 1024. But when instream is a FileInputStream, the returned value is always 1024.

int c;
while ((c = instream.read(buffer, offset, 1024)) != -1)
    System.out.println("Bytes read: " + c);

The input source file size is much more than 1024 bytes. Why is the returned value of GZIPInputStream unpredictable? Shouldn't it always read up to the said value 1024? Thanks!

Answer 1

It's just an artifact of compression. Typically a compressed block in a GZIP (which is variable in size) cannot be read unless the entirety of the block is decompressed.

You are reading blocks:

0           1024           2048           3072           4096...

But if the compressed blocks' boundaries looks like this:

0       892     1201        2104         2924 ...

You're going to get a first read of 892 bytes, then 309 (1201-892), then 903 (2104-1201), etc. This is a slight over-simplification, but not much.

As Miserable Variable commented above, the read should never return MORE than 1024 otherwise that would imply a buffer overrun.

Answer 2

No, the returned value does not need to be equal to 1024 - consider what should be returned in the case of aa file of size 4 bytes. Always use the returned value for processing. Also, depending on the encoding type, it may be less than what you would expect due to circumstances out of your control (fe a network that only provides 512 bytes/sec).

Java GZIPInputStream.read() function

Question

2 answers

solution1
1 2012-03-04 03:29:56

solution2
0 2012-03-04 03:18:46

Java GZIPInputStream.read() function

Question

2 answers

solution1 1 2012-03-04 03:29:56

solution2 0 2012-03-04 03:18:46

solution1
1 2012-03-04 03:29:56

solution2
0 2012-03-04 03:18:46