[英]Trouble grabbing text from a website using Jsoup
I'm trying to grab a price from an amazon link. 我正在尝试从亚马逊链接获取价格。
Here's the html I'm focusing on: 这是我关注的html:
<div class="buying" id="priceBlock">
<table class="product">
<tbody>
<tr id="actualPriceRow">
<td class="priceBlockLabelPrice" id="actualPriceLabel">Price:</td>
<td id="actualPriceContent">
<span id="actualPriceValue">
<b class="priceLarge">
$1.99
</b>
</span>
</td>
</tr>
</tbody>
</table>
</div>
I'm trying to grab that $1.99 text. 我正在尝试获取该1.99美元的文本。
Here's my code that is trying to grab it. 这是我尝试获取的代码。
protected Void doInBackground(Void... params) {
try {
// Connect to the web site
Document document = Jsoup.connect(url).get();
// Get the html document title
Elements trs = document.select("table.product");
for (Element tr : trs)
{
Elements tds = tr.select("b.priceLarge");
Element price1 = tds.first();
String str1 = price1.text();
System.out.println(str1);
String str2 = str1.replaceAll( "[$,]", "" );
double aInt = Double.parseDouble(str2);
System.out.println("Price: " + aInt);
}
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
Why isn't this code working? 为什么此代码不起作用?
You have to use a user agent
so the site won't reject you as a bot . 您必须使用
user agent
以便网站不会拒绝您成为漫游器 。 You should also add some timeout limit in order to override the default one, which might be too short for you. 您还应该添加一些超时限制,以覆盖默认值,这对于您来说可能太短了。 Three seconds is a good option but feel free to change it at will.
三秒是一个不错的选择,但可以随意更改。
timeout(0)
will wait as long as the server needs to give some response. 只要服务器需要给出一些响应,
timeout(0)
就会等待。 If you don't want a limit use that. 如果您不想要限制,请使用它。 There is also some weird
DOM
parsing you are doing, which is causing a NullPointerException
. 您正在执行一些奇怪的
DOM
解析,这会导致NullPointerException
。 Try this 尝试这个
String url = "http://www.amazon.com/dp/B00H2T37SO/?tag=stackoverfl08-20";
Document doc = Jsoup
.connect(url)
.userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36")
.timeout(3000)
.get();
Elements prices = doc.select("table.product b.priceLarge");
for (Element pr : prices)
{
String priceWithCurrency = pr.text();
System.out.println(priceWithCurrency);
String priceAsText = priceWithCurrency.replaceAll( "[$,]", "" );
double priceAsNumber = Double.parseDouble(priceAsText);
System.out.println("Price: " + priceAsNumber);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.