简体   繁体   English

Gzip压缩和解压缩,无需任何编码

[英]Gzip compression and decompression without any encoding

I want to decompress a string in java which was gzip compressed in python. 我想在java中解压缩一个字符串,该字符串在python中被gzip压缩。

Normally, I use base64 encoding on compressed string in python and then decode that compressed string before performing decompression in java. 通常,我在python中对压缩字符串使用base64编码,然后在Java中执行解压缩之前解码该压缩字符串。 This works fine while using base64 encoding. 使用base64编码时,这可以正常工作。

But is there a way to decompress a string in java which was gzip compressed in python without using base64 encoding. 但是有没有一种方法可以解压缩java中的字符串,而该字符串在python中是gzip压缩的,而无需使用base64编码。

Actually, I want to http post the compressed binary data to a server where the binary data gets decompressed. 实际上,我想将压缩后的二进制数据http发布到服务器上,在该服务器中解压缩二进制数据。 Here compression and http post in done in python and server side is java. 在python和服务器端完成的压缩和http发布是java。

I tried this code without base64 encode in python and read that in java using buffered reader and then converted that read compressed string into byte[] using getBytes() which is given to GZIPInputStream for decompression. 我尝试了在python中没有base64编码的代码,并使用缓冲的读取器在java中读取了代码,然后使用getBytes()将读取的压缩字符串转换为byte [],该字节被提供给GZIPInputStream进行解压缩。 But this throws an exception as: 但这引发了一个异常:

java.io.IOException: Not in GZIP format at 
java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:154)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:75)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:85)
    at GZipFile.gunzipIt(GZipFile.java:58)
    at GZipFile.main(GZipFile.java:42)

Please give me a solution to perform compression and decompression without any encoding. 请给我一个无需任何编码即可执行压缩和解压缩的解决方案。 Is there a way to send binary data in http post in python? 有没有办法在python的http post中发送二进制数据?

This is the compression code in python: 这是python中的压缩代码:

import StringIO  
import gzip  
import base64  
import os  


m='hello'+'\r\n'+'world'  

out = StringIO.StringIO()  
with gzip.GzipFile(fileobj=out, mode="wb") as f:  

    f.write(m)  
f=open('comp_dump','wb')  
f.write(base64.b64encode(out.getvalue()))  
f.close()  

This is the decompression code in java: 这是java中的解压缩代码:

//$Id$

import java.io.*;  
import java.io.FileInputStream;  
import java.io.FileOutputStream;  
import java.io.IOException;  
import java.util.zip.GZIPInputStream;  
import javax.xml.bind.DatatypeConverter;  
import java.util.Arrays;

public class GZipFile
{


    public static String readCompressedData()throws Exception
    {
            String compressedStr ="";
            String nextLine;
            BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream("comp_dump")));
            try
            {
                    while((nextLine=reader.readLine())!=null)
                    {
                            compressedStr += nextLine;
                    }
            }
            finally
            {
                    reader.close();
            }
            return compressedStr;
    }

    public static void main( String[] args ) throws Exception
    {
            GZipFile gZip = new GZipFile();
            byte[] contentInBytes = DatatypeConverter.parseBase64Binary(readCompressedData());

            String decomp = gZip.gunzipIt(contentInBytes);
            System.out.println(decomp);
    }

    /**
     * GunZip it
     */
    public static String gunzipIt(final byte[] compressed){

            byte[] buffer = new byte[1024];
            StringBuilder decomp = new StringBuilder() ;

            try{

                    GZIPInputStream gzis = new GZIPInputStream(new ByteArrayInputStream(compressed));

                    int len;
                    while ((len = gzis.read(buffer)) > 0) {

                            decomp.append(new String(buffer, 0, len));

                    }

                    gzis.close();

            }catch(IOException ex){
                    ex.printStackTrace();
            }
            return decomp.toString();
    }

} }

Not every byte[] can be converted to a string, and the conversion back could give other bytes. 并非每个byte []都可以转换为字符串,并且转换回来可以提供其他字节。

Please define encoding explicitly when compress and do the same when decompress. 请在压缩时显式定义编码,在解压缩时进行相同的定义。 Otherwise your OS , JVM etc... will do it for you. 否则,您的OSJVM等将为您完成此任务。 And probably will mess it up. 可能会弄乱它。

For example: on my Linux machine: 例如:在我的Linux机器上:

Python 蟒蛇

import sys
print sys.getdefaultencoding()
>> ascii

Java Java的

System.out.println(Charset.defaultCharset());
>> UTF-8

Related answer: https://stackoverflow.com/a/14467099/3014866 相关答案: https//stackoverflow.com/a/14467099/3014866

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM