简体   繁体   中英

Trying to use jSoup to scrape data from a table

First time poster and fairly new coder, so please go easy on me. I'm trying to use jSoup to scrape data from a table. However, I'm having a couple problems:

1) I'm using NetBeans. I get a "stop" error on Line 30 (Elements tds...) that says cannot find symbol symbol method getElementsByTag. I'm confused because I thought I imported the correct package, and I use the same code a couple lines above and get no error.

2) When I run the code, I get an error that says:

Exception in thread "main" java.lang.NullPointerException
at mytest.JsoupTest1.main(JsoupTest1.java:26)

Which I thought means that a variable with a value of NULL is being used. Did I incorrectly enter the "row" variable in my for loop above?

Here's my code. I truly appreciate any help!

package mytest;

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class JsoupTest1 {
    private static Object row;


    public static void main(String[] args) {
        Document doc = null;
        try {
            doc = Jsoup.connect( "http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=0&type=8&season=2015&month=0&season1=2015&ind=0&team=18&rost=0&age=0&filter=&players=0" ).get();
        }

        catch (IOException ioe) {
            ioe.printStackTrace();
        }

        Element table = doc.getElementById( "LeaderBoard1_dg1_ct100" );

        Elements rows = table.getElementsByTag( "tr" );
        for( Element row:rows ) {
        }

        Elements tds = row.getElementsByTag( "td" );
        for( int i=0; i < tds.size(); i++ ) {
            System.out.println(tds.get(i).text());
        }
    }
}

Welcome to StackOverflow.

This works.

Document doc = null;
try {
    doc = Jsoup
            .connect(
                    "http://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=0&type=8&season=2015&month=0&season1=2015&ind=0&team=18&rost=0&age=0&filter=&players=0")
            .get();
}
catch (IOException ioe) {
    ioe.printStackTrace();
}

Element table = doc.getElementById("LeaderBoard1_dg1_ctl00");
Elements rows = table.getElementsByTag("tr");
for (Element row : rows) {
    Elements tds = row.getElementsByTag("td");
    for (int i = 0; i < tds.size(); i++) {
        System.out.println(tds.get(i).text());
    }
}

There are three problems with your code.

  1. The id you are using is wrong. Instead of LeaderBoard1_dg1_ct100 use LeaderBoard1_dg1_ctl00 . You mistook the l for 1 .
  2. The second problem is the Object row . No need for this one. Remove it.
  3. You had the iteration of the rows outside of the for loop. And because you had the Object row variable, no compilation errors where present, thus hiding the problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM