简体   繁体   中英

How to extract/parse this html table in java using jsoup?

I loosely know how to parse html tables in jsoup, but the table that I'm working with is somewhere in the webpage and I don't know how to get to it: https://finance.yahoo.com/calendar/earnings?symbol=nflx

It's the one with the earnings dates.

I know that you have to do

Document doc = Jsoup.connect("https://finance.yahoo.com/calendar/earnings?symbol=nflx").get();

Then in a loop:

for (Element table : doc.select("some string") {

how do I get the needed string for the table?

You don't actually need to traverse all the code with for (Element table : doc.select("some string") { you can get the table direct from the code.

To be able to get the table you will need first to inspect the code using the Developer Tools of your favorite browser (assuming that you are using one that has). Like this:

在此输入图像描述

And identify the element you want to get, in your case the specific table is:

<table class="data-table W(100%) Bdcl(c) Pos(r) BdB Bdc($c-fuji-grey-c)" data-reactid="4">

The code to get to it is:

Document doc = Jsoup.connect("https://finance.yahoo.com/calendar/earnings?symbol=nflx")
                    .timeout(600000) //added timeout because my internet sucks
                    .get();
Elements tableDiv = doc.getElementsByAttributeValue("class", "data-table W(100%) Bdcl(c) Pos(r) BdB Bdc($c-fuji-grey-c)");

Then you have an org.jsoup.select.Elements collection where you can parse in the same way, getting the elements from inside the table using the methods getElementsBy[whateverAreAvailable]

Here is an example how you can print only that table:

tableDiv.forEach(tbody -> tbody.getElementsByTag("tbody")
                               .forEach(tr -> System.out.println(tr)));

Use your favorite IDE to find out which methods to use. I think that this is enough to you figure out where to go.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM