使用Jsoup難以從網站上獲取文本

Question

我正在嘗試從亞馬遜鏈接獲取價格。

這是我關注的html：

<div class="buying" id="priceBlock">
    <table class="product">
        <tbody>
            <tr id="actualPriceRow">
                <td class="priceBlockLabelPrice" id="actualPriceLabel">Price:</td>
                <td id="actualPriceContent">
                    <span id="actualPriceValue">
                        <b class="priceLarge">
                                $1.99
                        </b>
                    </span>

                </td>
            </tr>
        </tbody>
    </table>
</div>

我正在嘗試獲取該1.99美元的文本。

這是我嘗試獲取的代碼。

protected Void doInBackground(Void... params) {
            try {
                // Connect to the web site
                Document document = Jsoup.connect(url).get();
                // Get the html document title
                Elements trs = document.select("table.product");



                for (Element tr : trs)
                {
                    Elements tds = tr.select("b.priceLarge");
                    Element price1 = tds.first();
                    String str1 = price1.text();
                    System.out.println(str1);
                    String str2 = str1.replaceAll( "[$,]", "" );
                    double aInt = Double.parseDouble(str2);
                    System.out.println("Price: " + aInt);

                }

            } catch (IOException e) {
                e.printStackTrace();
            }

            return null;
        }

為什么此代碼不起作用？

Answer 1

您必須使用user agent以便網站不會拒絕您成為漫游器。 您還應該添加一些超時限制，以覆蓋默認值，這對於您來說可能太短了。 三秒是一個不錯的選擇，但可以隨意更改。 只要服務器需要給出一些響應， timeout(0)就會等待。 如果您不想要限制，請使用它。 您正在執行一些奇怪的DOM解析，這會導致NullPointerException 。 嘗試這個

String url = "http://www.amazon.com/dp/B00H2T37SO/?tag=stackoverfl08-20";
Document doc = Jsoup
                .connect(url)
                .userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36")
                .timeout(3000)
                .get();

Elements prices = doc.select("table.product b.priceLarge");
for (Element pr : prices)
{
    String priceWithCurrency = pr.text();
    System.out.println(priceWithCurrency);
    String priceAsText = priceWithCurrency.replaceAll( "[$,]", "" );
    double priceAsNumber = Double.parseDouble(priceAsText);
    System.out.println("Price: " + priceAsNumber);
}

使用Jsoup難以從網站上獲取文本

問題描述

1 個解決方案

解決方案1
1 已采納 2014-12-18 04:49:48

使用Jsoup難以從網站上獲取文本

問題描述

1 個解決方案

解決方案1 1 已采納 2014-12-18 04:49:48

解決方案1
1 已采納 2014-12-18 04:49:48