简体   繁体   English

尝试使用jSoup从表中抓取数据

[英]Trying to use jSoup to scrape data from a table

First time poster and fairly new coder, so please go easy on me. 初次发布者和相当新的编码器,所以请放轻松。 I'm trying to use jSoup to scrape data from a table. 我正在尝试使用jSoup从表中抓取数据。 However, I'm having a couple problems: 但是,我遇到了两个问题:

1) I'm using NetBeans. 1)我正在使用NetBeans。 I get a "stop" error on Line 30 (Elements tds...) that says cannot find symbol symbol method getElementsByTag. 我在第30行(元素tds ...)上遇到“停止”错误,提示找不到符号符号方法getElementsByTag。 I'm confused because I thought I imported the correct package, and I use the same code a couple lines above and get no error. 我很困惑,因为我以为我导入了正确的程序包,并且在上面的几行中使用了相同的代码,所以没有错误。

2) When I run the code, I get an error that says: 2)当我运行代码时,我得到一个错误,指出:

Exception in thread "main" java.lang.NullPointerException
at mytest.JsoupTest1.main(JsoupTest1.java:26)

Which I thought means that a variable with a value of NULL is being used. 我认为这意味着正在使用值为NULL的变量。 Did I incorrectly enter the "row" variable in my for loop above? 我是否在上面的for循环中错误地输入了“ row”变量?

Here's my code. 这是我的代码。 I truly appreciate any help! 我真的很感谢您的帮助!

package mytest;

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class JsoupTest1 {
    private static Object row;


    public static void main(String[] args) {
        Document doc = null;
        try {
            doc = Jsoup.connect( "http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=0&type=8&season=2015&month=0&season1=2015&ind=0&team=18&rost=0&age=0&filter=&players=0" ).get();
        }

        catch (IOException ioe) {
            ioe.printStackTrace();
        }

        Element table = doc.getElementById( "LeaderBoard1_dg1_ct100" );

        Elements rows = table.getElementsByTag( "tr" );
        for( Element row:rows ) {
        }

        Elements tds = row.getElementsByTag( "td" );
        for( int i=0; i < tds.size(); i++ ) {
            System.out.println(tds.get(i).text());
        }
    }
}

Welcome to StackOverflow. 欢迎使用StackOverflow。

This works. 这可行。

Document doc = null;
try {
    doc = Jsoup
            .connect(
                    "http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=0&type=8&season=2015&month=0&season1=2015&ind=0&team=18&rost=0&age=0&filter=&players=0")
            .get();
}
catch (IOException ioe) {
    ioe.printStackTrace();
}

Element table = doc.getElementById("LeaderBoard1_dg1_ctl00");
Elements rows = table.getElementsByTag("tr");
for (Element row : rows) {
    Elements tds = row.getElementsByTag("td");
    for (int i = 0; i < tds.size(); i++) {
        System.out.println(tds.get(i).text());
    }
}

There are three problems with your code. 您的代码存在三个问题。

  1. The id you are using is wrong. 您使用的ID错误。 Instead of LeaderBoard1_dg1_ct100 use LeaderBoard1_dg1_ctl00 . 代替LeaderBoard1_dg1_ct100使用LeaderBoard1_dg1_ctl00 You mistook the l for 1 . 你把l误以为是1
  2. The second problem is the Object row . 第二个问题是Object row No need for this one. 不需要这个。 Remove it. 去掉它。
  3. You had the iteration of the rows outside of the for loop. 您在for循环之外进行了行的迭代。 And because you had the Object row variable, no compilation errors where present, thus hiding the problem. 并且由于您具有“ Object row变量,因此不会出现任何编译错误,从而隐藏了问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM