简体   繁体   English

用Deflater或GZIPOutputStream Java压缩数据

[英]Java- concatenating compressed data with Deflater or GZIPOutputStream

We have a bunch of threads that take a block of data, compress this data and then eventually concatenate them into one large byte array. 我们有一堆线程需要一个数据块,压缩这些数据,然后最终将它们串联成一个大字节数组。 If anyone can expand on this idea or recommend another method, that'd be awesome. 如果有人可以扩大这个想法或推荐其他方法,那将是很棒的。 I've currently got two methods that I'm trying out, but neither are working the way they should: 我目前有两种方法正在尝试,但两种方法均无法正常工作:


The first : I have each thread's run() function take the input data and just use GZIPOutputStream to compress it and write it to the buffer. 第一个 :我让每个线程的run()函数获取输入数据,然后仅使用GZIPOutputStream进行压缩并将其写入缓冲区。

The problem with this approach here is that, because each thread has one block of data which is part of a longer complete data when I call GZIPOutputStream , it treats that little block as a complete piece of data to zip. 这种方法的问题在于,因为当我调用GZIPOutputStream时,每个线程都有一个数据块,该数据块是较长的完整数据的一部分,因此它将这个小块视为要压缩的完整数据块。 That means it sticks on the header and trailer (I also use a custom dictionary so I've got no idea how many bits the header is now nor how to find out). 这意味着它粘在标题和尾部(我也使用自定义词典,所以我不知道标题现在有多少位,也不知道如何找出)。

I think you could manually cut off the header and trailer and you would just be left with compressed data (and leave the header of the first block and the trailer of the last block). 我认为您可以手动切断标题和尾部,并且只剩下压缩数据(并保留第一个块的标题和最后一个块的尾部)。 The other thing I'm not sure about with this method is whether I can even do that. 我不确定使用此方法的另一件事是我是否可以做到这一点。 If I leave the header on the first block of data, will it still decompress correctly. 如果我将标头保留在第一个数据块上,它将仍然正确解压缩。 Doesn't that header contain information for ONLY the first block of the data and not the other concatenated blocks? 该标头是否仅包含数据的第一个块的信息,而不包含其他串联的块的信息?


The second method is to use the Deflater class. 第二种方法是使用Deflater类。 In that case, I can simply set the input, set the dictionary, and then call deflate() . 在那种情况下,我可以简单地设置输入,设置字典,然后调用deflate()

The problem is, that's not gzip format. 问题是,这不是gzip格式。 That's just "raw" compressed data. 那只是“原始”压缩数据。 I have no idea how to make it so that gzip can recognize the final output. 我不知道如何制作它,以便gzip可以识别最终输出。

You need a method that writes to a single GZIPOutputStream that is called by the other threads, with suitable co-ordination between them so the data doesn't get mixed up. 您需要一个方法来写入其他线程调用的单个GZIPOutputStream,并在它们之间进行适当的协调,以免数据混淆。 Or else have the threads write to temporary files, and assemble and zip it all in a second phase. 否则让线程写入临时文件,然后在第二阶段将其组装并压缩。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM