简体   繁体   English

Jsoup不正确的值儿童大小

[英]Jsoup incorrect value children size

Jsoup incorrect counts the number of children: Jsoup错误地计算了孩子的数量:

    Document document = Jsoup
            .parse(testString);

    Element div = document.select("div").first();
    Elements divChildren = div.children();
    System.out.println(divChildren.size());

For example, if testString = 例如,如果testString =

<div><div><p>text1</p></div><p>text2</p></div>

or 要么

<div><h1><p>text1</p></h1><p>text2</p></div>

then divChildren.size() = 2 然后divChildren.size()= 2

if testString = 如果testString =

<div><p><p>text1</p></p><p>text2</p></div>

then divChildren.size() = 4 然后divChildren.size()= 4

what am I doing wrong? 我究竟做错了什么?

Because of this : 因为这个

The P element represents a paragraph. P元素代表一个段落。 It cannot contain block-level elements (including P itself). 它不能包含块级元素(包括P本身)。

If you take a look at what document is holding after parsing 如果您看一下解析后持有的document

String testString ="<div><p><p>text1</p></p><p>text2</p></div>";

you will see 你会看见

<html>
 <head></head>
 <body>
  <div>
   <p></p>
   <p>text1</p>
   <p></p>
   <p>text2</p>
  </div>
 </body>
</html>

As @Rejesh pointed p can't contain other block-level-elements like p itself so Jsoup prevents it by closing such incorrect outer p elements (separate closure for opening tag and closing tag). 由于@Rejesh指出 p不能包含p本身之类的其他块级元素,因此Jsoup通过关闭此类不正确的外部p元素(分别关闭open标签和close标签)来防止它。 In your case 就你而言

    <p><p>text</p></p>

will become 会变成

<p></p> <p>text1</p> <p></p>

so your div 所以你的div

<div><p><p>text1</p></p><p>text2</p></div>

will be parsed as 将被解析为

  <div>
   <p></p>
   <p>text1</p>
   <p></p>
   <p>text2</p>
  </div>

and as you see there are 4 children (two empty p and two p with text). 如您所见,有四个孩子(两个空p和两个带文本的p )。


If you want to turn off this validating mechanism you can use XML parser instead of standard HTML parser with 如果要关闭此验证机制,可以使用XML解析器代替标准的HTML解析器,

String testString ="<div><p><p>text1</p></p><p>text2</p></div>";

Document document = Jsoup.parse(testString,"",Parser.xmlParser());
System.out.println(document);
Element div = document.select("div").first();
Elements divChildren = div.children();
System.out.println(divChildren.size());

will now print 2 . 现在将打印2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM