[英]How to extract HTML table data using Java from a website?
I've got this website: www.bloomberg.com 我有这个网站: www.bloomberg.com
And I want to extract the name and price for each company listed; 我想提取列出的每个公司的名称和价格; i've looked around but the following code i found doesn't work?
我环顾四周,但发现以下代码不起作用?
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class testing {
public static void main(String[] args) {
Document doc;
try {
doc = Jsoup.connect("http://http://www.bloomberg.com/markets/stocks/movers/ftse-100/").get();
for (Element table : doc.select("table[class=index_members_table dual_border_data_table market_sortable_table alt_rows_stat_table]")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
System.out.println(tds.get(0).text() + "->" + tds.get(1).text());
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
(1) URL is invalid (has extra scheme) (2) There are <tr>s
with no <td>s
(ex. headers) (1)URL无效(有额外的方案)(2)有
<tr>s
,没有<td>s
(例如标头)
::::
doc = Jsoup.connect("http://www.bloomberg.com/markets/stocks/movers/ftse-100/").get();
for (Element table : doc
.select("table[class=index_members_table dual_border_data_table market_sortable_table alt_rows_stat_table]")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
if (tds.isEmpty()) { // Header <tr> with only <th>s
continue;
}
::::
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.