简体   繁体   English

如何提高GZIP性能

[英]How to improve GZIP performance

Currently I do have the problem that this piece of code will be called >500k of times. 目前,我确实有一个问题,这段代码将被调用> 500k次。 The size of the compressed byte[] is less than 1KB. 压缩后的byte[]大小小于1KB。 Every time the method is called all of the streams has to been created. 每次调用该方法时,都必须创建所有流。 So I am looking for a way to improve this code. 因此,我正在寻找一种改进此代码的方法。

private byte[] unzip(byte[] data) throws IOException, DataFormatException {

    byte[] unzipData = new byte[4096];

    try (ByteArrayInputStream in = new ByteArrayInputStream(data);
         GZIPInputStream gzipIn = new GZIPInputStream(in);
         ByteArrayOutputStream out = new ByteArrayOutputStream()) {

        int read = 0;
        while( (read = gzipIn.read(unzipData)) != -1) {
            out.write(unzipData, 0, read);
        }

        return out.toByteArray();
    }
}

I already tried to replace ByteArrayOutputStream with a ByteBuffer , but at the time of creation I don't know how many bytes I need to allocate. 我已经尝试ByteBuffer替换ByteArrayOutputStream ,但是在创建时我不知道需要分配多少字节。

Also, I tried to use Inflater but I stumbled across the problem descriped here . 另外,我尝试使用Inflater但偶然发现了此处描述的问题。

Any other idea what I could do to improve the perfomance of this code. 任何其他想法,我可以做些什么来改善此代码的性能。

UPDATE#1 更新#1

  • Maybe this lib helps someone. 也许这个 lib可以帮助某人。
  • Also there is an open JDK-Bug . 也有一个开放的JDK-Bug
  1. Profile your application, to be sure that you're really spending optimizable time in this function. 分析您的应用程序,以确保您确实在此功能上花费了可优化的时间。 It doesn't matter how many times you call this function; 调用该函数多少次无关紧要; if it doesn't account for a significant fraction of overall program execution time, then optimization is wasted. 如果它在整个程序执行时间中所占的比例不大,那么优化就浪费了。

  2. Pre-size the ByteArrayOutputStream . 调整ByteArrayOutputStream的大小。 The default buffer size is 32 bytes, and resizes require copying all existing bytes. 默认缓冲区大小为32个字节,并且调整大小要求复制所有现有字节。 If you know that your decoded arrays will be around 1k, use new ByteArrayOutputStream(2048) . 如果您知道解码后的数组大约为1k,请使用new ByteArrayOutputStream(2048)

  3. Rather than reading a byte at a time, read a block at a time, using a pre-allocated byte[] . 使用预先分配的byte[]一次读取一个块,而不是一次读取一个字节。 Beware that you must use the return value from read as an input to write . 注意,必须使用read的返回值作为write的输入。 Better, use something like Jakarta Commons IOUtils.copy() to avoid mistakes. 最好使用Jakarta Commons IOUtils.copy()之类的东西来避免错误。

I'm not sure if it applies in your case, but I've found incredible speed difference when comparing using the default buffer size of GZIPInputStream vs increasing to 65536. 我不确定它是否适用于您的情况,但是使用默认的GZIPInputStream缓冲区大小与增加到65536进行比较时,发现速度差异非常大。

example: using a 500M input file -> 示例:使用500M输入文件->

new GZIPInputStream(new FileInputStream(path.toFile())) // takes 4 mins to process

vs VS

new GZIPInputStream(new FileInputStream(path.toFile()), 65536) // takes 10s

J Ĵ

More details can be found here http://java-performance.info/java-io-bufferedinputstream-and-java-util-zip-gzipinputstream/ 可以在这里找到更多详细信息http://java-performance.info/java-io-bufferedinputstream-and-java-util-zip-gzipinputstream/

Both BufferedInputStream and GZIPInputStream have internal buffers. BufferedInputStream和GZIPInputStream都有内部缓冲区。 Default size for the former one is 8192 bytes and for the latter one is 512 bytes. 前一个的默认大小为8192字节,而后一个的默认大小为512字节。 Generally it worth increasing any of these sizes to at least 65536. 通常值得将这些大小中的任何一个增加到至少65536。

You can use the Inflater class method reset() to reuse the Inflater object without having to recreate it each time. 您可以使用Inflater类方法reset()重用Inflater对象,而不必每次都重新创建它。 You will have a little bit of added programming to do in order to decode the gzip header and perform the integrity check with the gzip trailer. 您将需要做一些额外的编程工作,以便解码gzip标头并使用gzip预告片执行完整性检查。 You would then use Inflater with the nowrap option to decompress the raw deflated data after then gzip header and before the trailer. 然后,您可以将Inflaternowrap选项一起使用,以在gzip标头之后和预告片之前解压缩原始的压缩数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM