简体   繁体   中英

Getting a compressed version of web page

I am using HttpClient 4.1 to download a web page. I would like to get a compressed version:

    HttpGet request = new HttpGet(url);
    request.addHeader("Accept-Encoding", "gzip,deflate");

    HttpResponse response = httpClient.execute(request,localContext);
    HttpEntity entity = response.getEntity();

response.getFirstHeader("Content-Encoding") shows "Content-Encoding: gzip" however, entity.getContentEncoding() is null .

If I put:

entity = new GzipDecompressingEntity(entity);

I get:

java.io.IOException: Not in GZIP format

It looks like the resulting page is plain text and not compressed even though "Content-Encoding" header shows it's gzipped.

I have tried this on several URLs (from different websites) but get the same results.

How can I get a compressed version of a web page?

Don't use HttpClient if you don't want your API to handle mundane things like unzipping.

You can use the basic URLConnection class to fetch the compressed stream, as demonstrated by the following code :

public static void main(String[] args) {
    try {
        URL url = new URL("http://code.jquery.com/jquery-latest.js");
        URLConnection con = url.openConnection();
        // comment next line if you want to have something readable in your console
        con.addRequestProperty("Accept-Encoding", "gzip,deflate");
        BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
        String l;
        while ((l=in.readLine())!=null) {
            System.out.println(l);
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM