简体   繁体   English

使用jsoup解析亚马逊页面返回204状态

[英]parsing amazon page using jsoup returns 204 status

sample page: http://www.amazon.com/gp/offer-listing/1589942140 样本页面: http : //www.amazon.com/gp/offer-listing/1589942140

public void connect( String url ) {        
    this.conn = Jsoup.connect( url );  
}

/**
 * Executes the request and parses the result.
 * @return 
 */
public boolean parse() 
{
    try {
        this.page = this.conn.get();
        return true;
    } catch (IOException ex) {
        // log it here
        System.out.format("Error: %s%n", ex);
        return false;
    }
}    

parsing the page creates ioexception below: 解析页面会在下面创建ioexception:

org.jsoup.HttpStatusException: HTTP error fetching URL. org.jsoup.HttpStatusException:提取URL时发生HTTP错误。 Status=204, URL= http://www.amazon.com/gp/offer-listing/1589942140 状态= 204,网址= http://www.amazon.com/gp/offer-listing/1589942140

i tried it with the native java url class below and it's not creating IOException: 我用下面的本机java url类尝试了它,并且没有创建IOException:

    try {
        URL myURL = new URL("http://www.amazon.com/gp/offer-listing/1589942140");
        URLConnection myURLConnection = myURL.openConnection();
        myURLConnection.connect();
        System.out.format("%s", myURLConnection.getContentType());
    } 
    catch (MalformedURLException e) { 
        // new URL() failed
        System.out.format("Error: %s%n", e);
    } 
    catch (IOException e) {   
        // openConnection() failed
        System.out.format("Error: %s%n", e);
    }

any ideas why this is so ? 任何想法为什么会这样?

The following works for me: 以下对我有用:

            System.out.println(Jsoup.connect("http://www.amazon.com/gp/offer-listing/1589942140").userAgent("Mozilla").get().text());;

The URL tried above was as specified by you above. 上面尝试的网址是您上面指定的。 (sample page: http://www.amazon.com/gp/offer-listing/1589942140 ) (示例页面: http : //www.amazon.com/gp/offer-listing/1589942140

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM