简体   繁体   English

Jsoup.parse()。body()。getAllElements()将标签加倍

[英]Jsoup.parse().body().getAllElements() doubles tags

Is there any reason why this JSoup doubles the contents of the body tag here? 是否有任何原因导致此JSoup在此处将body标签的内容加倍?

public static void main(String[] args) {
    Jsoup.parse(myHtmlString).body().getAllElements()
}

It does just happen for this html code: 这确实发生在此html代码中:

<html>
 <head> 
  <style> p{margin-bottom:0px;margin-top:0px;} body{font-family:Arial;font-size:10pt;} </style> 
 </head> 
 <body> 
  <div class="wordsection1"> 
   <p class="msonormal"> <span style="color:#1F497D;">&nbsp;</span></p> 
   <p class="msonormal"> <span style="color:#1F497D;">&nbsp;</span></p> 
   <div> 
    <div style="padding-top:3.0pt;padding-left:0cm;padding-right:0cm;padding-bottom:0cm;border-left-style:none;border-top-width:1.0pt;border-bottom-style:none;border-right-style:none;border-top-color:#B5C4DF;border-top-style:solid;"> 
     <p class="msonormal"> <b> <span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;">Von:</span></b><span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;"> Uasdsaa Aasdsaa [mailto:bullet1@left.de] <br><b>Gesendet:</b> Dienstag, 6. August 2013 08:59<br><b>An:</b> Helmut Grashoff (dsfaasas@gmbh.de)<br><b>Betreff:</b> erster Test f&uuml;r die 2.2.1 (mit HTML)</span></p> 
    </div> 
   </div> 
   <p class="msonormal">&nbsp;</p> 
   <p class="msonormal">Erst schauen wir mal, ob die Mail &uuml;berhaupt ankommt.</p> 
   <p class="msonormal">&nbsp;</p> 
   <p class="msonormal">Und <i> <u>gleichzeitig</u></i> <span style="font-size:18.0pt;">spiele</span> ich noch ein <span style="color:#31859C;">wenig </span> <span style="font-family:Algerian;">mit dieser Zeile</span></p> 
   <p class="msonormal">&nbsp;</p> 
   <p class="msonormal"> <span mso-fareast-language="DE">Mit freundlichen Gruessen</span></p> 
   <p class="msonormal"> <span mso-fareast-language="DE">&nbsp;</span></p> 
   <p class="msonormal"> <span mso-fareast-language="DE" style="color:#31849B;">Uasdsaa Aasdsaa</span></p> 
   <p class="msonormal"> <span mso-fareast-language="DE" style="font-size:9.0pt;">Iasdsaa-Sasdsaa</span></p> 
   <p class="msonormal"> <span mso-fareast-language="DE">&nbsp;</span></p> 
   <p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span></p> 
   <p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span></p> 
   <p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">56218 M&uuml;lheim-K&auml;rlich</span></p> 
   <p class="msonormal"> <span mso-fareast-language="DE">&nbsp;</span></p> 
   <p class="msonormal"> <span mso-fareast-language="DE">&nbsp;</span></p> 
   <p class="msonormal">&nbsp;</p> 
  </div> 
 </body>
</html>

The result of the above parsing method, not that here is only the content of the body part: 以上解析方法的结果,不只是这里只是正文部分的内容:

<body> 
 <div class="wordsection1"> 
  <p class="msonormal"> <span style="color:#1F497D;">&nbsp;</span></p> 
  <p class="msonormal"> <span style="color:#1F497D;">&nbsp;</span></p> 
  <div> 
   <div style="padding-top:3.0pt;padding-left:0cm;padding-right:0cm;padding-bottom:0cm;border-left-style:none;border-top-width:1.0pt;border-bottom-style:none;border-right-style:none;border-top-color:#B5C4DF;border-top-style:solid;"> 
    <p class="msonormal"> <b> <span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;">Von:</span></b><span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;"> Uasdsaa Aasdsaa [mailto:bullet1@left.de] Br2nL<b>Gesendet:</b> Dienstag, 6. August 2013 08:59Br2nL<b>An:</b> Helmut Grashoff (asdafsdf@gmbh.de)Br2nL<b>Betreff:</b> erster Test f&uuml;r die 2.2.1 (mit HTML)</span></p> 
   </div> 
  </div> 
  <p class="msonormal">&nbsp;</p> 
  <p class="msonormal">Erst schauen wir mal, ob die Mail &uuml;berhaupt ankommt.</p> 
  <p class="msonormal">&nbsp;</p> 
  <p class="msonormal">Und <i> <u>gleichzeitig</u></i> <span style="font-size:18.0pt;">spiele</span> ich noch ein <span style="color:#31859C;">wenig </span> <span style="font-family:Algerian;">mit dieser Zeile</span></p> 
  <p class="msonormal">&nbsp;</p> 
  <p class="msonormal"> <span mso-fareast-language="DE">Mit freundlichen Gruessen</span></p> 
  <p class="msonormal"> <span mso-fareast-language="DE">&nbsp;</span></p> 
  <p class="msonormal"> <span mso-fareast-language="DE" style="color:#31849B;">Uasdsaa Aasdsaa</span></p> 
  <p class="msonormal"> <span mso-fareast-language="DE" style="font-size:9.0pt;">Iasdsaa-Sasdsaa</span></p> 
  <p class="msonormal"> <span mso-fareast-language="DE">&nbsp;</span></p> 
  <p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span></p> 
  <p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span></p> 
  <p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">56218 M&uuml;lheim-K&auml;rlich</span></p> 
  <p class="msonormal"> <span mso-fareast-language="DE">&nbsp;</span></p> 
  <p class="msonormal"> <span mso-fareast-language="DE">&nbsp;</span></p> 
  <p class="msonormal">&nbsp;</p> 
 </div>  
</body>
<div class="wordsection1"> 
 <p class="msonormal"> <span style="color:#1F497D;">&nbsp;</span></p> 
 <p class="msonormal"> <span style="color:#1F497D;">&nbsp;</span></p> 
 <div> 
  <div style="padding-top:3.0pt;padding-left:0cm;padding-right:0cm;padding-bottom:0cm;border-left-style:none;border-top-width:1.0pt;border-bottom-style:none;border-right-style:none;border-top-color:#B5C4DF;border-top-style:solid;"> 
   <p class="msonormal"> <b> <span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;">Von:</span></b><span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;"> Uasdsaa Aasdsaa [mailto:bullet1@left.de] Br2nL<b>Gesendet:</b> Dienstag, 6. August 2013 08:59Br2nL<b>An:</b> Helmut Grashoff (asfasd@gmbh.de)Br2nL<b>Betreff:</b> erster Test f&uuml;r die 2.2.1 (mit HTML)</span></p> 
  </div> 
 </div> 
 <p class="msonormal">&nbsp;</p> 
 <p class="msonormal">Erst schauen wir mal, ob die Mail &uuml;berhaupt ankommt.</p> 
 <p class="msonormal">&nbsp;</p> 
 <p class="msonormal">Und <i> <u>gleichzeitig</u></i> <span style="font-size:18.0pt;">spiele</span> ich noch ein <span style="color:#31859C;">wenig </span> <span style="font-family:Algerian;">mit dieser Zeile</span></p> 
 <p class="msonormal">&nbsp;</p> 
 <p class="msonormal"> <span mso-fareast-language="DE">Mit freundlichen Gruessen</span></p> 
 <p class="msonormal"> <span mso-fareast-language="DE">&nbsp;</span></p> 
 <p class="msonormal"> <span mso-fareast-language="DE" style="color:#31849B;">Uasdsaa Aasdsaa</span></p> 
 <p class="msonormal"> <span mso-fareast-language="DE" style="font-size:9.0pt;">Iasdsaa-Sasdsaa</span></p> 
 <p class="msonormal"> <span mso-fareast-language="DE">&nbsp;</span></p> 
 <p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span></p> 
 <p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span></p> 
 <p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">56218 M&uuml;lheim-K&auml;rlich</span></p> 
 <p class="msonormal"> <span mso-fareast-language="DE">&nbsp;</span></p> 
 <p class="msonormal"> <span mso-fareast-language="DE">&nbsp;</span></p> 
 <p class="msonormal">&nbsp;</p> 
</div>
<p class="msonormal"> <span style="color:#1F497D;">&nbsp;</span></p>
<span style="color:#1F497D;">&nbsp;</span>
<p class="msonormal"> <span style="color:#1F497D;">&nbsp;</span></p>
<span style="color:#1F497D;">&nbsp;</span>
<div> 
 <div style="padding-top:3.0pt;padding-left:0cm;padding-right:0cm;padding-bottom:0cm;border-left-style:none;border-top-width:1.0pt;border-bottom-style:none;border-right-style:none;border-top-color:#B5C4DF;border-top-style:solid;"> 
  <p class="msonormal"> <b> <span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;">Von:</span></b><span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;"> Uasdsaa Aasdsaa [mailto:bullet1@left.de] Br2nL<b>Gesendet:</b> Dienstag, 6. August 2013 08:59Br2nL<b>An:</b> Helmut Grashoff (asdffasd@gmbh.de)Br2nL<b>Betreff:</b> erster Test f&uuml;r die 2.2.1 (mit HTML)</span></p> 
 </div> 
</div>
<div style="padding-top:3.0pt;padding-left:0cm;padding-right:0cm;padding-bottom:0cm;border-left-style:none;border-top-width:1.0pt;border-bottom-style:none;border-right-style:none;border-top-color:#B5C4DF;border-top-style:solid;"> 
 <p class="msonormal"> <b> <span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;">Von:</span></b><span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;"> Uasdsaa Aasdsaa [mailto:bullet1@left.de] Br2nL<b>Gesendet:</b> Dienstag, 6. August 2013 08:59Br2nL<b>An:</b> Helmut Grashoff (asdfsad@gmbh.de)Br2nL<b>Betreff:</b> erster Test f&uuml;r die 2.2.1 (mit HTML)</span></p> 
</div>
<p class="msonormal"> <b> <span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;">Von:</span></b><span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;"> Uasdsaa Aasdsaa [mailto:bullet1@left.de] Br2nL<b>Gesendet:</b> Dienstag, 6. August 2013 08:59Br2nL<b>An:</b> Helmut Grashoff (asdfsdfa@gmbh.de)Br2nL<b>Betreff:</b> erster Test f&uuml;r die 2.2.1 (mit HTML)</span></p>
<b> <span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;">Von:</span></b>
<span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;">Von:</span>
<span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;"> Uasdsaa Aasdsaa [mailto:bullet1@left.de] Br2nL<b>Gesendet:</b> Dienstag, 6. August 2013 08:59Br2nL<b>An:</b> Helmut Grashoff (asdfsdf@gmbh.de)Br2nL<b>Betreff:</b> erster Test f&uuml;r die 2.2.1 (mit HTML)</span>
<b>Gesendet:</b>
<b>An:</b>
<b>Betreff:</b>
<p class="msonormal">&nbsp;</p>
<p class="msonormal">Erst schauen wir mal, ob die Mail &uuml;berhaupt ankommt.</p>
<p class="msonormal">&nbsp;</p>
<p class="msonormal">Und <i> <u>gleichzeitig</u></i> <span style="font-size:18.0pt;">spiele</span> ich noch ein <span style="color:#31859C;">wenig </span> <span style="font-family:Algerian;">mit dieser Zeile</span></p>
<i> <u>gleichzeitig</u></i>
<u>gleichzeitig</u>
<span style="font-size:18.0pt;">spiele</span>
<span style="color:#31859C;">wenig </span>
<span style="font-family:Algerian;">mit dieser Zeile</span>
<p class="msonormal">&nbsp;</p>
<p class="msonormal"> <span mso-fareast-language="DE">Mit freundlichen Gruessen</span></p>
<span mso-fareast-language="DE">Mit freundlichen Gruessen</span>
<p class="msonormal"> <span mso-fareast-language="DE">&nbsp;</span></p>
<span mso-fareast-language="DE">&nbsp;</span>
<p class="msonormal"> <span mso-fareast-language="DE" style="color:#31849B;">Uasdsaa Aasdsaa</span></p>
<span mso-fareast-language="DE" style="color:#31849B;">Uasdsaa Aasdsaa</span>
<p class="msonormal"> <span mso-fareast-language="DE" style="font-size:9.0pt;">Iasdsaa-Sasdsaa</span></p>
<span mso-fareast-language="DE" style="font-size:9.0pt;">Iasdsaa-Sasdsaa</span>
<p class="msonormal"> <span mso-fareast-language="DE">&nbsp;</span></p>
<span mso-fareast-language="DE">&nbsp;</span>
<p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span></p>
<span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span>
<p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span></p>
<span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span>
<p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">56218 M&uuml;lheim-K&auml;rlich</span></p>
<span verdana","sans-serif";mso-fareast-language:de"="">56218 M&uuml;lheim-K&auml;rlich</span>
<p class="msonormal"> <span mso-fareast-language="DE">&nbsp;</span></p>
<span mso-fareast-language="DE">&nbsp;</span>
<p class="msonormal"> <span mso-fareast-language="DE">&nbsp;</span></p>
<span mso-fareast-language="DE">&nbsp;</span>
<p class="msonormal">&nbsp;</p>

I am using Java 6 and Jsoup 1.7.2. 我正在使用Java 6和Jsoup 1.7.2。

Its obvious and perfect. 它的明显和完美。

Infact it doesn't double but repeats the Elements as many times as its depth wrt to the reference Node 实际上,它不会加倍,但会重复Elements到参考Node深度的次数

Each and every Node under body is considered as Element including body , until it reaches the leaf Node . body之下的每个Node都被视为包括body在内的Element ,直到到达叶Node为止。

If you consider the text <b>Betreff:</b> It is repeated 7 times since, it is 7 levels below the root ( <body> ) , and also it is the child with greater depth in the tree. 如果考虑文本<b>Betreff:</b><b>Betreff:</b>被重复7次,位于根( <body> )下方7个级别,并且也是树中深度更大的子级。

body > div >div > div > p > span >b

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM