Jsoup不正确的值儿童大小

Question

Jsoup incorrect counts the number of children: Jsoup错误地计算了孩子的数量：

    Document document = Jsoup
            .parse(testString);

    Element div = document.select("div").first();
    Elements divChildren = div.children();
    System.out.println(divChildren.size());

For example, if testString = 例如，如果testString =

<div><div><p>text1</p></div><p>text2</p></div>

or 要么

<div><h1><p>text1</p></h1><p>text2</p></div>

then divChildren.size() = 2 然后divChildren.size（）= 2

if testString = 如果testString =

<div><p><p>text1</p></p><p>text2</p></div>

then divChildren.size() = 4 然后divChildren.size（）= 4

what am I doing wrong? 我究竟做错了什么？

Answer 1

Because of this : 因为这个：

The P element represents a paragraph. P元素代表一个段落。 It cannot contain block-level elements (including P itself). 它不能包含块级元素（包括P本身）。

Answer 2

If you take a look at what document is holding after parsing 如果您看一下解析后持有的document

String testString ="<div><p><p>text1</p></p><p>text2</p></div>";

you will see 你会看见

<html>
 <head></head>
 <body>
  <div>
   <p></p>
   <p>text1</p>
   <p></p>
   <p>text2</p>
  </div>
 </body>
</html>

As @Rejesh pointed p can't contain other block-level-elements like p itself so Jsoup prevents it by closing such incorrect outer p elements (separate closure for opening tag and closing tag). 由于@Rejesh指出 p不能包含p本身之类的其他块级元素，因此Jsoup通过关闭此类不正确的外部p元素（分别关闭open标签和close标签）来防止它。 In your case 就你而言

    <p><p>text</p></p>

will become 会变成

<p></p> <p>text1</p> <p></p>

so your div 所以你的div

<div><p><p>text1</p></p><p>text2</p></div>

will be parsed as 将被解析为

  <div>
   <p></p>
   <p>text1</p>
   <p></p>
   <p>text2</p>
  </div>

and as you see there are 4 children (two empty p and two p with text). 如您所见，有四个孩子（两个空p和两个带文本的p ）。

If you want to turn off this validating mechanism you can use XML parser instead of standard HTML parser with 如果要关闭此验证机制，可以使用XML解析器代替标准的HTML解析器，

String testString ="<div><p><p>text1</p></p><p>text2</p></div>";

Document document = Jsoup.parse(testString,"",Parser.xmlParser());
System.out.println(document);
Element div = document.select("div").first();
Elements divChildren = div.children();
System.out.println(divChildren.size());

will now print 2 . 现在将打印2 。

Jsoup不正确的值儿童大小

问题描述

2 个解决方案

解决方案1
2 2014-06-21 14:22:22

解决方案2
2 已采纳 2014-06-21 14:32:25

Jsoup不正确的值儿童大小

问题描述

2 个解决方案

解决方案1 2 2014-06-21 14:22:22

解决方案2 2 已采纳 2014-06-21 14:32:25

解决方案1
2 2014-06-21 14:22:22

解决方案2
2 已采纳 2014-06-21 14:32:25