[英]Jsoup incorrect value children size
Jsoup incorrect counts the number of children: Jsoup错误地计算了孩子的数量:
Document document = Jsoup
.parse(testString);
Element div = document.select("div").first();
Elements divChildren = div.children();
System.out.println(divChildren.size());
For example, if testString = 例如,如果testString =
<div><div><p>text1</p></div><p>text2</p></div>
or 要么
<div><h1><p>text1</p></h1><p>text2</p></div>
then divChildren.size() = 2 然后divChildren.size()= 2
if testString = 如果testString =
<div><p><p>text1</p></p><p>text2</p></div>
then divChildren.size() = 4 然后divChildren.size()= 4
what am I doing wrong? 我究竟做错了什么?
If you take a look at what document
is holding after parsing 如果您看一下解析后持有的
document
String testString ="<div><p><p>text1</p></p><p>text2</p></div>";
you will see 你会看见
<html>
<head></head>
<body>
<div>
<p></p>
<p>text1</p>
<p></p>
<p>text2</p>
</div>
</body>
</html>
As @Rejesh pointed p
can't contain other block-level-elements like p
itself so Jsoup prevents it by closing such incorrect outer p
elements (separate closure for opening tag and closing tag). 由于@Rejesh指出
p
不能包含p
本身之类的其他块级元素,因此Jsoup通过关闭此类不正确的外部p
元素(分别关闭open标签和close标签)来防止它。 In your case 就你而言
<p><p>text</p></p>
will become 会变成
<p></p> <p>text1</p> <p></p>
so your div
所以你的
div
<div><p><p>text1</p></p><p>text2</p></div>
will be parsed as 将被解析为
<div>
<p></p>
<p>text1</p>
<p></p>
<p>text2</p>
</div>
and as you see there are 4 children (two empty p
and two p
with text). 如您所见,有四个孩子(两个空
p
和两个带文本的p
)。
If you want to turn off this validating mechanism you can use XML parser instead of standard HTML parser with 如果要关闭此验证机制,可以使用XML解析器代替标准的HTML解析器,
String testString ="<div><p><p>text1</p></p><p>text2</p></div>";
Document document = Jsoup.parse(testString,"",Parser.xmlParser());
System.out.println(document);
Element div = document.select("div").first();
Elements divChildren = div.children();
System.out.println(divChildren.size());
will now print 2 . 现在将打印2 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.