简体   繁体   English

大字节数组上的ZLib解压缩失败

[英]ZLib decompression fails on large byte array

When experimenting with ZLib compression, I have run across a strange problem. 在尝试使用ZLib压缩时,我遇到了一个奇怪的问题。 Decompressing a zlib-compressed byte array with random data fails reproducibly if the source array is at least 32752 bytes long. 如果源数组的长度至少为32752字节,则使用随机数据解压缩zlib压缩的字节数组会失败。 Here's a little program that reproduces the problem, you can see it in action on IDEOne . 这是一个重现问题的小程序,你可以在IDEOne上看到它 The compression and decompression methods are standard code picked off tutorials. 压缩和解压缩方法是标准代码摘录教程。

public class ZlibMain {

    private static byte[] compress(final byte[] data) {
        final Deflater deflater = new Deflater();
        deflater.setInput(data);

        deflater.finish();
        final byte[] bytesCompressed = new byte[Short.MAX_VALUE];
        final int numberOfBytesAfterCompression = deflater.deflate(bytesCompressed);
        final byte[] returnValues = new byte[numberOfBytesAfterCompression];
        System.arraycopy(bytesCompressed, 0, returnValues, 0, numberOfBytesAfterCompression);
        return returnValues;

    }

    private static byte[] decompress(final byte[] data) {
        final Inflater inflater = new Inflater();
        inflater.setInput(data);
        try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length)) {
            final byte[] buffer = new byte[Math.max(1024, data.length / 10)];
            while (!inflater.finished()) {
                final int count = inflater.inflate(buffer);
                outputStream.write(buffer, 0, count);
            }
            outputStream.close();
            final byte[] output = outputStream.toByteArray();
            return output;
        } catch (DataFormatException | IOException e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(final String[] args) {
        roundTrip(100);
        roundTrip(1000);
        roundTrip(10000);
        roundTrip(20000);
        roundTrip(30000);
        roundTrip(32000);
        for (int i = 32700; i < 33000; i++) {
            if(!roundTrip(i))break;
        }
    }

    private static boolean roundTrip(final int i) {
        System.out.printf("Starting round trip with size %d: ", i);
        final byte[] data = new byte[i];
        for (int j = 0; j < data.length; j++) {
            data[j]= (byte) j;
        }
        shuffleArray(data);

        final byte[] compressed = compress(data);
        try {
            final byte[] decompressed = CompletableFuture.supplyAsync(() -> decompress(compressed))
                                                         .get(2, TimeUnit.SECONDS);
            System.out.printf("Success (%s)%n", Arrays.equals(data, decompressed) ? "matching" : "non-matching");
            return true;
        } catch (InterruptedException | ExecutionException | TimeoutException e) {
            System.out.println("Failure!");
            return false;
        }
    }

    // Implementing Fisher–Yates shuffle
    // source: https://stackoverflow.com/a/1520212/342852
    static void shuffleArray(byte[] ar) {
        Random rnd = ThreadLocalRandom.current();
        for (int i = ar.length - 1; i > 0; i--) {
            int index = rnd.nextInt(i + 1);
            // Simple swap
            byte a = ar[index];
            ar[index] = ar[i];
            ar[i] = a;
        }
    }
}

Is this a known bug in ZLib? 这是ZLib中的已知错误吗? Or do I have an error in my compress / decompress routines? 或者我的压缩/解压缩例程中是否有错误?

Apparently the compress() method was faulty. 显然,compress()方法有问题。 This one works: 这个工作:

public static byte[] compress(final byte[] data) {
    try (final ByteArrayOutputStream outputStream = 
                                     new ByteArrayOutputStream(data.length);) {

        final Deflater deflater = new Deflater();
        deflater.setInput(data);
        deflater.finish();
        final byte[] buffer = new byte[1024];
        while (!deflater.finished()) {
            final int count = deflater.deflate(buffer);
            outputStream.write(buffer, 0, count);
        }

        final byte[] output = outputStream.toByteArray();
        return output;
    } catch (IOException e) {
        throw new IllegalStateException(e);
    }
}

It is an error in the logic of the compress / decompress methods; 压缩/解压缩方法的逻辑是错误的; I am not this deep in the implementations but with debugging I found the following: 我在实现中没有这么深,但通过调试我发现了以下内容:

When the buffer of 32752 bytes is compressed, the deflater.deflate() method returns a value of 32767, this is the size to which you initialized the buffer in the line: 当压缩32752字节的缓冲区时, deflater.deflate()方法返回值32767,这是您在行中初始化缓冲区的大小:

final byte[] bytesCompressed = new byte[Short.MAX_VALUE];

If you increase the buffer size for example to 如果您将缓冲区大小增加到例如

final byte[] bytesCompressed = new byte[4 * Short.MAX_VALUE];

the you will see, that the input of 32752 bytes actually is deflated to 32768 bytes. 您将看到,32752字节的输入实际上被缩减为32768字节。 So in your code, the compressed data does not contain all the data which should be in there. 因此,在您的代码中,压缩数据不包含应该存在的所有数据。

When you then try to decompress, the inflater.inflate() method returns zero which indicates that more input data is needed. 然后,当您尝试解压缩时, inflater.inflate()方法返回零,表示需要更多输入数据。 But as you only check for inflater.finished() you end in an endless loop. 但是当你只检查inflater.finished()你会以无限循环结束。

So you can either increase the buffer size on compressing, but that probably just means haveing the problem with bigger files, or you better need to rewrite to compress/decompress logic to process your data in chunks. 因此,您可以在压缩时增加缓冲区大小,但这可能只是意味着更大的文件存在问题,或者您最好需要重写压缩/解压缩逻辑以处理数据块。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM