简体   繁体   中英

JSoup parsing a text file containing a html table with Java

I am really unsure how I can get the information I need to place into a database, the code below just prints the whole file.

File input = new File("shipMove.txt");
Document doc = Jsoup.parse(input, null);    
System.out.println(doc.toString());

My HTML is here from line 61 and I am needing to get the items under the column headings but also grab the MMSI number which is not under a column heading but in the href tag. I haven't used JSoup other than to get the HTML from the web page. I can only really see tutorials to use php and I'd rather not use it.

To get those information, the best way is to use Jsoup's selector API . Using selectors, your code will look something like this (pseudeocode!):

File input = new File("shipMove.txt");
Document doc = Jsoup.parse(input, null);


Elements matches = doc.select("<your selector here>");

for( Element element : matches )
{
    // do something with found elements
}

There's a good documentation available here: Use selector-syntax to find elements . If you get stuck nevertheless, please describe your problem.

Here are some hints for that selector, you can use:

// Select the table with class 'shipinfo'
Elements tables = doc.select("table.shipinfo");

// Iterate over all tables found (since it's only one, you can use first() instead
for( Element element : tables )
{
    // Select all 'td' tags of that table
    Elements tdTags = element.select("td"); 

    // Iterate over all 'td' tags found
    for( Element td : tdTags )
    {
        // Print it's text if not empty
        final String text = td.text();

        if( text.isEmpty() == false )
        {
            System.out.println(td.text());
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM