[英]Trying to use jSoup to scrape data from a table
First time poster and fairly new coder, so please go easy on me. 初次发布者和相当新的编码器,所以请放轻松。 I'm trying to use jSoup to scrape data from a table.
我正在尝试使用jSoup从表中抓取数据。 However, I'm having a couple problems:
但是,我遇到了两个问题:
1) I'm using NetBeans. 1)我正在使用NetBeans。 I get a "stop" error on Line 30 (Elements tds...) that says cannot find symbol symbol method getElementsByTag.
我在第30行(元素tds ...)上遇到“停止”错误,提示找不到符号符号方法getElementsByTag。 I'm confused because I thought I imported the correct package, and I use the same code a couple lines above and get no error.
我很困惑,因为我以为我导入了正确的程序包,并且在上面的几行中使用了相同的代码,所以没有错误。
2) When I run the code, I get an error that says: 2)当我运行代码时,我得到一个错误,指出:
Exception in thread "main" java.lang.NullPointerException
at mytest.JsoupTest1.main(JsoupTest1.java:26)
Which I thought means that a variable with a value of NULL is being used. 我认为这意味着正在使用值为NULL的变量。 Did I incorrectly enter the "row" variable in my for loop above?
我是否在上面的for循环中错误地输入了“ row”变量?
Here's my code. 这是我的代码。 I truly appreciate any help!
我真的很感谢您的帮助!
package mytest;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class JsoupTest1 {
private static Object row;
public static void main(String[] args) {
Document doc = null;
try {
doc = Jsoup.connect( "http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=0&type=8&season=2015&month=0&season1=2015&ind=0&team=18&rost=0&age=0&filter=&players=0" ).get();
}
catch (IOException ioe) {
ioe.printStackTrace();
}
Element table = doc.getElementById( "LeaderBoard1_dg1_ct100" );
Elements rows = table.getElementsByTag( "tr" );
for( Element row:rows ) {
}
Elements tds = row.getElementsByTag( "td" );
for( int i=0; i < tds.size(); i++ ) {
System.out.println(tds.get(i).text());
}
}
}
Welcome to StackOverflow. 欢迎使用StackOverflow。
This works. 这可行。
Document doc = null;
try {
doc = Jsoup
.connect(
"http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=0&type=8&season=2015&month=0&season1=2015&ind=0&team=18&rost=0&age=0&filter=&players=0")
.get();
}
catch (IOException ioe) {
ioe.printStackTrace();
}
Element table = doc.getElementById("LeaderBoard1_dg1_ctl00");
Elements rows = table.getElementsByTag("tr");
for (Element row : rows) {
Elements tds = row.getElementsByTag("td");
for (int i = 0; i < tds.size(); i++) {
System.out.println(tds.get(i).text());
}
}
There are three problems with your code. 您的代码存在三个问题。
LeaderBoard1_dg1_ct100
use LeaderBoard1_dg1_ctl00
. LeaderBoard1_dg1_ct100
使用LeaderBoard1_dg1_ctl00
。 You mistook the l
for 1
. l
误以为是1
。 Object row
. Object row
。 No need for this one. for
loop. for
循环之外进行了行的迭代。 And because you had the Object row
variable, no compilation errors where present, thus hiding the problem. Object row
变量,因此不会出现任何编译错误,从而隐藏了问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.