[英]parsing amazon page using jsoup returns 204 status
sample page: http://www.amazon.com/gp/offer-listing/1589942140 样本页面: http : //www.amazon.com/gp/offer-listing/1589942140
public void connect( String url ) {
this.conn = Jsoup.connect( url );
}
/**
* Executes the request and parses the result.
* @return
*/
public boolean parse()
{
try {
this.page = this.conn.get();
return true;
} catch (IOException ex) {
// log it here
System.out.format("Error: %s%n", ex);
return false;
}
}
parsing the page creates ioexception below: 解析页面会在下面创建ioexception:
org.jsoup.HttpStatusException: HTTP error fetching URL. org.jsoup.HttpStatusException:提取URL时发生HTTP错误。 Status=204, URL= http://www.amazon.com/gp/offer-listing/1589942140 状态= 204,网址= http://www.amazon.com/gp/offer-listing/1589942140
i tried it with the native java url class below and it's not creating IOException: 我用下面的本机java url类尝试了它,并且没有创建IOException:
try {
URL myURL = new URL("http://www.amazon.com/gp/offer-listing/1589942140");
URLConnection myURLConnection = myURL.openConnection();
myURLConnection.connect();
System.out.format("%s", myURLConnection.getContentType());
}
catch (MalformedURLException e) {
// new URL() failed
System.out.format("Error: %s%n", e);
}
catch (IOException e) {
// openConnection() failed
System.out.format("Error: %s%n", e);
}
any ideas why this is so ? 任何想法为什么会这样?
The following works for me: 以下对我有用:
System.out.println(Jsoup.connect("http://www.amazon.com/gp/offer-listing/1589942140").userAgent("Mozilla").get().text());;
The URL tried above was as specified by you above. 上面尝试的网址是您上面指定的。 (sample page: http://www.amazon.com/gp/offer-listing/1589942140 ) (示例页面: http : //www.amazon.com/gp/offer-listing/1589942140 )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.