简体   繁体   English

如何使用jsoup在java中提取/解析这个html表?

[英]How to extract/parse this html table in java using jsoup?

I loosely know how to parse html tables in jsoup, but the table that I'm working with is somewhere in the webpage and I don't know how to get to it: https://finance.yahoo.com/calendar/earnings?symbol=nflx 我松散地知道如何在jsoup中解析html表,但我正在使用的表位于网页的某个地方,我不知道如何到达它: https//finance.yahoo.com/calendar/earnings ?符号= NFLX

It's the one with the earnings dates. 这是收入日期的那个。

I know that you have to do 我知道你必须这样做

Document doc = Jsoup.connect("https://finance.yahoo.com/calendar/earnings?symbol=nflx").get();

Then in a loop: 然后循环:

for (Element table : doc.select("some string") {

how do I get the needed string for the table? 如何获取表格所需的字符串?

You don't actually need to traverse all the code with for (Element table : doc.select("some string") { you can get the table direct from the code. 您实际上不需要遍历所有代码for (Element table : doc.select("some string") {您可以直接从代码中获取表格。

To be able to get the table you will need first to inspect the code using the Developer Tools of your favorite browser (assuming that you are using one that has). 为了能够获得该表,您需要首先使用您喜欢的浏览器的开发人员工具检查代码(假设您使用的是开源工具)。 Like this: 像这样:

在此输入图像描述

And identify the element you want to get, in your case the specific table is: 并确定您想要获得的元素,在您的情况下,特定的表是:

<table class="data-table W(100%) Bdcl(c) Pos(r) BdB Bdc($c-fuji-grey-c)" data-reactid="4">

The code to get to it is: 到达它的代码是:

Document doc = Jsoup.connect("https://finance.yahoo.com/calendar/earnings?symbol=nflx")
                    .timeout(600000) //added timeout because my internet sucks
                    .get();
Elements tableDiv = doc.getElementsByAttributeValue("class", "data-table W(100%) Bdcl(c) Pos(r) BdB Bdc($c-fuji-grey-c)");

Then you have an org.jsoup.select.Elements collection where you can parse in the same way, getting the elements from inside the table using the methods getElementsBy[whateverAreAvailable] 然后你有一个org.jsoup.select.Elements集合,你可以用同样的方式解析,使用getElementsBy[whateverAreAvailable]方法从表中获取元素

Here is an example how you can print only that table: 以下是如何仅打印该表的示例:

tableDiv.forEach(tbody -> tbody.getElementsByTag("tbody")
                               .forEach(tr -> System.out.println(tr)));

Use your favorite IDE to find out which methods to use. 使用您喜欢的IDE找出要使用的方法。 I think that this is enough to you figure out where to go. 我认为这足以让你弄清楚要去哪里。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM