[英]parse nodeList Value using Jsoup
http://www.smbs.biz/ExRate/StdExRate.jsp http://www.smbs.biz/ExRate/StdExRate.jsp
In this Website, I tried to parse the value of the table for the currency.在本网站中,我尝试解析表中货币的值。
The below value in the table is the value I want to extract.表中的以下值是我要提取的值。
In developer tool, I can see the value only in 'Elements' window, not in 'source' window.在开发人员工具中,我只能在“元素”窗口中看到该值,而不能在“源”窗口中看到。 I guess the data is called when using ajax?我猜数据是在使用ajax时调用的? How can I extract the data using Jsoup?如何使用 Jsoup 提取数据?
Here's the code I was trying to parse the code, which failed:这是我试图解析失败的代码:
try {
doc = Jsoup.connect("http://www.smbs.biz/ExRate/StdExRate.jsp").get();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
//Elements exchangeRateElement = doc.select(".brb0 td:nth-child(3)").eq(1);
Element exchangeRateElement = doc.getElementsByClass("brb0").get(10);
String cur=null;
for (Node node : doc.childNodes()) {
System.out.println("node : "+node);
if (node instanceof TextNode) {
cur = ((TextNode) node).getWholeText();
cur = ((TextNode) node).text();
break;
}
}
When we load the page in a browser with disabled JavaScript, we note, that the table remains empty.当我们在禁用 JavaScript 的浏览器中加载页面时,我们注意到该表保持为空。
Activating JavaScript and monitoring the network tab (chrome dev tools/F12) on a reload, we see a request:在重新加载时激活 JavaScript 并监控网络选项卡(chrome dev tools/F12),我们看到一个请求:
http://www.smbs.biz/ExRate/StdExRate_xml.jsp?arr_value=USD_2016-09-13_2016-10-05
And the response contains a chart with the needed information:响应包含一个包含所需信息的图表:
<chart
[...]
<set color='c93749' label='16.09.13' value='1110.6' />
<set color='c93749' label='16.09.19' value='1112.3' />
[...]
<set color='c93749' label='16.10.04' value='1102' />
<set color='c93749' label='16.10.05' value='1105.1' />
<styles>
[...]
</styles>
</chart>
Before we request the chart we need to grab the JSESSION
cookie and add it to the request.在我们请求图表之前,我们需要获取JSESSION
cookie 并将其添加到请求中。
Example Code示例代码
String userAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36";
try {
// response needed to grab the cookies: res.cookies()
Response res = Jsoup.connect("http://www.smbs.biz/ExRate/StdExRate.jsp")..timeout(10000)
.userAgent(userAgent).method(Method.GET).header("Host", "www.smbs.biz").execute();
Document doc = res.parse();
String startDate = doc.getElementById("startDate").attr("value").replace(".", "-");
String endDate = doc.getElementById("endDate").attr("value").replace(".", "-");
doc = Jsoup.connect("http://www.smbs.biz/ExRate/StdExRate_xml.jsp?arr_value=USD_" + startDate+"_" + endDate)
.userAgent(userAgent).timeout(10000).header("Host", "www.smbs.biz").cookies(res.cookies())
.header("Connection", "keep-alive").method(Method.GET)
.referrer("http://www.smbs.biz/ExRate/StdExRate.jsp").get();
Elements elements = doc.select("chart > set");
for (Element element : elements) {
System.out.println(element.attr("label") + ": " + element.attr("value"));
}
Element currentRateElement = doc.select("chart > set").last();
System.out.println("Current rate for " + currentRateElement.attr("label") + ": " + currentRateElement.attr("value"));
} catch (IOException e) {
e.printStackTrace();
}
Output输出
16.09.13: 1110.6
16.09.19: 1112.3
16.09.20: 1120
16.09.21: 1119.5
16.09.22: 1116.8
16.09.23: 1103.1
16.09.26: 1104.2
16.09.27: 1106.9
16.09.28: 1103.5
16.09.29: 1095.7
16.09.30: 1096.3
16.10.04: 1102
16.10.05: 1105.1
Current rate for 16.10.05: 1105.1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.