[英]JSoup extract the text from the table td which doesnt contains any html nodes
I have a html string like this : 我有一个像这样的html字符串:
String html="<table><tbody>
<tr>
<td><p>ABC</p></td>
<td>DEF</td>
</tr>
<tr>
<td><p>GHI</p></td>
<td>MNO</td>
</tr>
</tbody>
</table>";
I only need to extract the text which has no more child elements inside td tags.My current code returns me both text and html nodes. 我只需要提取td标签中没有更多子元素的文本。我当前的代码返回文本和html节点。
Elements elements = doc.select("tbody > tr");
for (Element e : elements) {
System.out.println(e.select("td").html());
}
But what I need as the out put is : 但我需要的是:
DEF
MNO
Thanks in advance. 提前致谢。
It is not clear to me, if you want to just the the text of each td
that is not port of a child of that td
, or if you want to further exclude all tds that have children. 我不清楚,如果你想只是每个td
的文本不是那个td
孩子的端口,或者你想进一步排除所有有孩子的td
。 Therefore you may have to adapt my solution a bit. 因此,您可能需要稍微调整我的解决方案。
String html="<table><tbody>"
+"<tr>"
+"<td><p>ABC</p></td>"
+"<td>DEF</td>"
+"<td>DEF2<p>ABC</p></td>"
+"</tr>"
+"<tr>"
+"<td><p>GHI</p></td>"
+"<td>MNO</td>"
+"<td>MNO2<p>GHI2</p></td>"
+"</tr>"
+"</tbody>"
+"</table>";
Document doc = Jsoup.parse(html);
Elements elements = doc.select("tbody > tr > td:matchesOwn(.+)");
for (Element e : elements) {
System.out.println(e.text());
}
The above solution looks for td elements that have any own text, ie that match the regular expression .+
(at least one character). 上面的解决方案查找具有任何自己的文本的td元素,即匹配正则表达式.+
(至少一个字符)。
If you want to further weed out the tds that contain children, you can do this: 如果你想进一步淘汰含有孩子的tds,你可以这样做:
Document doc = Jsoup.parse(html);
Elements elements = doc.select("tbody > tr > td:matchesOwn(.+):not(:has(*))");
for (Element e : elements) {
System.out.println(e.text());
}
This uses both, the :has()
and the :not()
pseudo selectors as explained in the JSOUP Docs 这使用了:has()
和:not()
伪选择器,如JSOUP Docs中所述
Try this CSS selector: 试试这个CSS选择器:
tbody > tr > td:not(:has(*))
http://try.jsoup.org/~K4qiK0SxQDeuhE9FvvmUDa3vKKI http://try.jsoup.org/~K4qiK0SxQDeuhE9FvvmUDa3vKKI
tbody /* Select any tbody */
> tr /* Select any tr directly under it */
> td /* Select any td directly under it ... */
:not(:has(*)) /* ... not having any element */
The *
operator matches only elements . *
运算符仅匹配元素 。 A text node is not an element. 文本节点不是元素。 It's just a kind of Node. 它只是一种节点。
Elements elements = doc.select("tbody > tr > td:not(:has(*))");
for (Element e : elements) {
System.out.println(e.select("td").html());
}
<td>DEF</td>
<td>MNO</td>
Try Element.child(int index)
with index = 0
. 尝试使用index = 0
Element.child(int index)
。
Elements elements = doc.select("tbody > tr");
for (Element e : elements) {
for (Element el : e.select("td")) {
// el.child(0)
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.