简体   繁体   中英

Gzip compression and decompression without any encoding

I want to decompress a string in java which was gzip compressed in python.

Normally, I use base64 encoding on compressed string in python and then decode that compressed string before performing decompression in java. This works fine while using base64 encoding.

But is there a way to decompress a string in java which was gzip compressed in python without using base64 encoding.

Actually, I want to http post the compressed binary data to a server where the binary data gets decompressed. Here compression and http post in done in python and server side is java.

I tried this code without base64 encode in python and read that in java using buffered reader and then converted that read compressed string into byte[] using getBytes() which is given to GZIPInputStream for decompression. But this throws an exception as:

java.io.IOException: Not in GZIP format at 
java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:154)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:75)
    at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:85)
    at GZipFile.gunzipIt(GZipFile.java:58)
    at GZipFile.main(GZipFile.java:42)

Please give me a solution to perform compression and decompression without any encoding. Is there a way to send binary data in http post in python?

This is the compression code in python:

import StringIO  
import gzip  
import base64  
import os  


m='hello'+'\r\n'+'world'  

out = StringIO.StringIO()  
with gzip.GzipFile(fileobj=out, mode="wb") as f:  

    f.write(m)  
f=open('comp_dump','wb')  
f.write(base64.b64encode(out.getvalue()))  
f.close()  

This is the decompression code in java:

//$Id$

import java.io.*;  
import java.io.FileInputStream;  
import java.io.FileOutputStream;  
import java.io.IOException;  
import java.util.zip.GZIPInputStream;  
import javax.xml.bind.DatatypeConverter;  
import java.util.Arrays;

public class GZipFile
{


    public static String readCompressedData()throws Exception
    {
            String compressedStr ="";
            String nextLine;
            BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream("comp_dump")));
            try
            {
                    while((nextLine=reader.readLine())!=null)
                    {
                            compressedStr += nextLine;
                    }
            }
            finally
            {
                    reader.close();
            }
            return compressedStr;
    }

    public static void main( String[] args ) throws Exception
    {
            GZipFile gZip = new GZipFile();
            byte[] contentInBytes = DatatypeConverter.parseBase64Binary(readCompressedData());

            String decomp = gZip.gunzipIt(contentInBytes);
            System.out.println(decomp);
    }

    /**
     * GunZip it
     */
    public static String gunzipIt(final byte[] compressed){

            byte[] buffer = new byte[1024];
            StringBuilder decomp = new StringBuilder() ;

            try{

                    GZIPInputStream gzis = new GZIPInputStream(new ByteArrayInputStream(compressed));

                    int len;
                    while ((len = gzis.read(buffer)) > 0) {

                            decomp.append(new String(buffer, 0, len));

                    }

                    gzis.close();

            }catch(IOException ex){
                    ex.printStackTrace();
            }
            return decomp.toString();
    }

}

Not every byte[] can be converted to a string, and the conversion back could give other bytes.

Please define encoding explicitly when compress and do the same when decompress. Otherwise your OS , JVM etc... will do it for you. And probably will mess it up.

For example: on my Linux machine:

Python

import sys
print sys.getdefaultencoding()
>> ascii

Java

System.out.println(Charset.defaultCharset());
>> UTF-8

Related answer: https://stackoverflow.com/a/14467099/3014866

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM