简体   繁体   English

在第一级的jsoup中提取元素,在表中没有递归

[英]extract element in jsoup in first level, no recursive in table

i need to display the principal row of this table, with another table nestint我需要用另一个表 nestint 显示这个表的主要行

<html><body>
<div id = div1><table><tbody>
<tr><td>Steve</td>
<td><table><tbody><tr><td>Steve2</td></tr></tbody></table>"
</tr></tbody></table></body></html>

The rows can be more than once.这些行可以不止一次。 I want to extract then content of the tr at the first level (not <tr><td>Steve2</td></tr> ).我想在第一级提取 tr 的内容(不是<tr><td>Steve2</td></tr> )。

This is the code:这是代码:

String html = "<html><body>"
+ "<div id = div1><table><tbody>"
+ "<tr><td>Steve</td>"
+ "<td><table><tbody><tr><td>Steve2</td></tr></tbody></table>"
+ "</tr></tbody></table></body></html>";
doc = Jsoup.parse(html);
Elements elemHtml = doc.select("div#div1>table");
for(Element elem1:elemHtml) {
    Elements elem2 = elem1.select("tr");
    for(Element elem3:elem2) {
        System.out.println("Content: "+elem3);
        System.out.println("----------");
    }
}

I tried to add <div> tag inside the table but the parse doesn't work.我试图在表格内添加<div>标记,但解析不起作用。

Change your css selector to div#div1>table>tboby>tr to map only the <tr> that are directly under your <tobdy> element, that's what > means in css将 css 选择器更改为div#div1>table>tboby>tr到 map 只有<tr>直接位于<tobdy>元素下,这就是>在 css 中的意思

I've made some more complex html, to show that the solution works for a more general case than the one in the question:我做了一些更复杂的 html,以表明该解决方案适用于比问题中的更一般的情况:

<html> <body> <div id = div1> <table> <tbody>
<tr> <td>Steve1</td> <td> <table> <tbody> <tr>
<td>Steve2a</td> </tr> <tr> <td>Steve2b</td>
</tr> </tbody> </table> </tr> <tr> <td>Steve3</td>
<td> <table> <tbody> <tr> <td>Steve4</td> </tr>
</tbody> </table> </tr> </tbody> </table>
</body> </html>

which results in the following table:结果如下表:

表格

Use the following selector to get all the table's rows - div#div1>table> tbody > tr使用以下选择器获取表格的所有行 - div#div1>table> tbody > tr
and then iterate over these rows to get the first row - select("td").first() .然后遍历这些行以获得第一行 - select("td").first()
Full code -完整代码 -

Document doc = null;
String html2 = "<html> <body> <div id = div1> <table> <tbody>" +
    "<tr> <td>Steve1</td> <td> <table> <tbody> <tr>" +
    "<td>Steve2a</td> </tr> <tr> <td>Steve2b</td>" +
    "</tr> </tbody> </table> </tr> <tr> <td>Steve3</td>" +
    "<td> <table> <tbody> <tr> <td>Steve4</td> </tr>" +
    "</tbody> </table> </tr> </tbody> </table>" +
    "</body> </html>";

doc = Jsoup.parse(html2);
Elements outerRows = doc.select("div#div1>table> tbody > tr");
for(Element row : outerRows) {
    Element data = row.select("td").first();
    System.out.println(data);
    System.out.println("------------");
}

If you want only the text (SteveX) than you can get it with the text method:如果您只想要文本 (SteveX),则可以使用text方法获取它:
System.out.println(data.text());

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM