So I am trying to download this page http://www.csfd.cz/film/895-28-dni-pote/prehled/ . I am using this code:
URL url = new URL("http://www.csfd.cz/film/895-28-dni-pote/prehled/");
try(BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream(),Charset.forName("UTF-8")))){
String line = br.readLine();
while(line != null){
System.out.println(line);
line = br.readLine();
}
It worked on some other pages, but now it is giving me some weird symbols. For example the second line I get is: " \\ ? c n ". (It has not been copied exactly as I see it in eclipse console.)
I think I am using UTF-8 encoding as is the page. In case you are wondering it is in Czech. Thanks for help.
$ curl -D- http://www.csfd.cz/film/895-28-dni-pote/prehled/
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 01 Feb 2016 08:11:36 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: close
X-Frame-Options: SAMEORIGIN
X-Powered-By: Nette Framework
Vary: X-Requested-With
X-From-Cache: TRUE
Content-Encoding: gzip`
▒}I▒▒▒▒^▒▒29B▒▒▒$R▒M▒$nER▒▒4X, @
etc....
Notice Content-Encoding: gzip
- the content is compressed using gzip, and you will need to decompress it in order to use it.
Study the classes in java.util.zip
, especially GzipInputStream
, which I believe you can wrap around a regular input stream.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.