I have a table with the following html:
<TABLE class=data-table cellSpacing=0 cellPadding=0>
<TBODY>
<TR>
<TD colSpan=4><A id=accounting name=accounting></A>
<H3>Accounting</H3></TD></TR>
<TR>
<TH class=data-tablehd align=left>FORM NO.</TH>
<TH class=data-tablehd align=left>TITLE</TH>
<TH class=data-tablehd align=right>Microsoft</TH>
<TH class=data-tablehd align=right>Acrobat</TH></TR>
<TR>
<TD><A id=1008ft name=1008ft>SF 1008-FT</A></TD>
<TD>Work for Others Funding Transfer Between Projects for an Agreement</TD>
<TD align=right><A
href="https://someurl1"
target=top>MS Word</A></TD>
<TD align=right><A
href="https://someurl2"
target=top>PDF </A></TD></TR>
...
I need to parse the <TR>
data getting something like
SF 1008-FT, Work for Others ... an Agreement, https://someurl1, https://someurl2
I have tried using the following code:
URL formURL = new URL("http://urlToParse");
Document doc = Jsoup.parse(formURL, 3000);
Element table = doc.select("TABLE[class = data-table]").first();
Iterator<Element> ite = table.select("td[colSpan=4]").iterator();
while(ite.next() != null) {
System.out.println(ite.next().text());
}
However this only returns the "back to Top" and some different headings located throughout the table.
Can someone help me write the correct JSoup code to parse the information I need?
I have not time to test, but you can use something like this:
Element table = doc.select("TABLE[class = data-table]").first();
Elements rows = table.select("tr");
for (Element td: rows.get(2).children()) {
System.out.println(td.text());
}
You get the children of the 3rd row of the table.
I found the solution with some small modification to a similar thread. The code that provides the solution is given below:
for (Element table : doc.select("table")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
formNumber = tds.get(0).text();
title = tds.get(1).text();
link1 = tds.get(2).select("a[href]").attr("href");
link2 = tds.get(3).select("a[href]").attr("href");
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.