[英]Jsoup.parse().body().getAllElements() doubles tags
是否有任何原因導致此JSoup在此處將body標簽的內容加倍?
public static void main(String[] args) {
Jsoup.parse(myHtmlString).body().getAllElements()
}
這確實發生在此html代碼中:
<html>
<head>
<style> p{margin-bottom:0px;margin-top:0px;} body{font-family:Arial;font-size:10pt;} </style>
</head>
<body>
<div class="wordsection1">
<p class="msonormal"> <span style="color:#1F497D;"> </span></p>
<p class="msonormal"> <span style="color:#1F497D;"> </span></p>
<div>
<div style="padding-top:3.0pt;padding-left:0cm;padding-right:0cm;padding-bottom:0cm;border-left-style:none;border-top-width:1.0pt;border-bottom-style:none;border-right-style:none;border-top-color:#B5C4DF;border-top-style:solid;">
<p class="msonormal"> <b> <span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;">Von:</span></b><span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;"> Uasdsaa Aasdsaa [mailto:bullet1@left.de] <br><b>Gesendet:</b> Dienstag, 6. August 2013 08:59<br><b>An:</b> Helmut Grashoff (dsfaasas@gmbh.de)<br><b>Betreff:</b> erster Test für die 2.2.1 (mit HTML)</span></p>
</div>
</div>
<p class="msonormal"> </p>
<p class="msonormal">Erst schauen wir mal, ob die Mail überhaupt ankommt.</p>
<p class="msonormal"> </p>
<p class="msonormal">Und <i> <u>gleichzeitig</u></i> <span style="font-size:18.0pt;">spiele</span> ich noch ein <span style="color:#31859C;">wenig </span> <span style="font-family:Algerian;">mit dieser Zeile</span></p>
<p class="msonormal"> </p>
<p class="msonormal"> <span mso-fareast-language="DE">Mit freundlichen Gruessen</span></p>
<p class="msonormal"> <span mso-fareast-language="DE"> </span></p>
<p class="msonormal"> <span mso-fareast-language="DE" style="color:#31849B;">Uasdsaa Aasdsaa</span></p>
<p class="msonormal"> <span mso-fareast-language="DE" style="font-size:9.0pt;">Iasdsaa-Sasdsaa</span></p>
<p class="msonormal"> <span mso-fareast-language="DE"> </span></p>
<p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span></p>
<p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span></p>
<p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">56218 Mülheim-Kärlich</span></p>
<p class="msonormal"> <span mso-fareast-language="DE"> </span></p>
<p class="msonormal"> <span mso-fareast-language="DE"> </span></p>
<p class="msonormal"> </p>
</div>
</body>
</html>
以上解析方法的結果,不只是這里只是正文部分的內容:
<body>
<div class="wordsection1">
<p class="msonormal"> <span style="color:#1F497D;"> </span></p>
<p class="msonormal"> <span style="color:#1F497D;"> </span></p>
<div>
<div style="padding-top:3.0pt;padding-left:0cm;padding-right:0cm;padding-bottom:0cm;border-left-style:none;border-top-width:1.0pt;border-bottom-style:none;border-right-style:none;border-top-color:#B5C4DF;border-top-style:solid;">
<p class="msonormal"> <b> <span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;">Von:</span></b><span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;"> Uasdsaa Aasdsaa [mailto:bullet1@left.de] Br2nL<b>Gesendet:</b> Dienstag, 6. August 2013 08:59Br2nL<b>An:</b> Helmut Grashoff (asdafsdf@gmbh.de)Br2nL<b>Betreff:</b> erster Test für die 2.2.1 (mit HTML)</span></p>
</div>
</div>
<p class="msonormal"> </p>
<p class="msonormal">Erst schauen wir mal, ob die Mail überhaupt ankommt.</p>
<p class="msonormal"> </p>
<p class="msonormal">Und <i> <u>gleichzeitig</u></i> <span style="font-size:18.0pt;">spiele</span> ich noch ein <span style="color:#31859C;">wenig </span> <span style="font-family:Algerian;">mit dieser Zeile</span></p>
<p class="msonormal"> </p>
<p class="msonormal"> <span mso-fareast-language="DE">Mit freundlichen Gruessen</span></p>
<p class="msonormal"> <span mso-fareast-language="DE"> </span></p>
<p class="msonormal"> <span mso-fareast-language="DE" style="color:#31849B;">Uasdsaa Aasdsaa</span></p>
<p class="msonormal"> <span mso-fareast-language="DE" style="font-size:9.0pt;">Iasdsaa-Sasdsaa</span></p>
<p class="msonormal"> <span mso-fareast-language="DE"> </span></p>
<p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span></p>
<p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span></p>
<p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">56218 Mülheim-Kärlich</span></p>
<p class="msonormal"> <span mso-fareast-language="DE"> </span></p>
<p class="msonormal"> <span mso-fareast-language="DE"> </span></p>
<p class="msonormal"> </p>
</div>
</body>
<div class="wordsection1">
<p class="msonormal"> <span style="color:#1F497D;"> </span></p>
<p class="msonormal"> <span style="color:#1F497D;"> </span></p>
<div>
<div style="padding-top:3.0pt;padding-left:0cm;padding-right:0cm;padding-bottom:0cm;border-left-style:none;border-top-width:1.0pt;border-bottom-style:none;border-right-style:none;border-top-color:#B5C4DF;border-top-style:solid;">
<p class="msonormal"> <b> <span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;">Von:</span></b><span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;"> Uasdsaa Aasdsaa [mailto:bullet1@left.de] Br2nL<b>Gesendet:</b> Dienstag, 6. August 2013 08:59Br2nL<b>An:</b> Helmut Grashoff (asfasd@gmbh.de)Br2nL<b>Betreff:</b> erster Test für die 2.2.1 (mit HTML)</span></p>
</div>
</div>
<p class="msonormal"> </p>
<p class="msonormal">Erst schauen wir mal, ob die Mail überhaupt ankommt.</p>
<p class="msonormal"> </p>
<p class="msonormal">Und <i> <u>gleichzeitig</u></i> <span style="font-size:18.0pt;">spiele</span> ich noch ein <span style="color:#31859C;">wenig </span> <span style="font-family:Algerian;">mit dieser Zeile</span></p>
<p class="msonormal"> </p>
<p class="msonormal"> <span mso-fareast-language="DE">Mit freundlichen Gruessen</span></p>
<p class="msonormal"> <span mso-fareast-language="DE"> </span></p>
<p class="msonormal"> <span mso-fareast-language="DE" style="color:#31849B;">Uasdsaa Aasdsaa</span></p>
<p class="msonormal"> <span mso-fareast-language="DE" style="font-size:9.0pt;">Iasdsaa-Sasdsaa</span></p>
<p class="msonormal"> <span mso-fareast-language="DE"> </span></p>
<p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span></p>
<p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span></p>
<p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">56218 Mülheim-Kärlich</span></p>
<p class="msonormal"> <span mso-fareast-language="DE"> </span></p>
<p class="msonormal"> <span mso-fareast-language="DE"> </span></p>
<p class="msonormal"> </p>
</div>
<p class="msonormal"> <span style="color:#1F497D;"> </span></p>
<span style="color:#1F497D;"> </span>
<p class="msonormal"> <span style="color:#1F497D;"> </span></p>
<span style="color:#1F497D;"> </span>
<div>
<div style="padding-top:3.0pt;padding-left:0cm;padding-right:0cm;padding-bottom:0cm;border-left-style:none;border-top-width:1.0pt;border-bottom-style:none;border-right-style:none;border-top-color:#B5C4DF;border-top-style:solid;">
<p class="msonormal"> <b> <span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;">Von:</span></b><span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;"> Uasdsaa Aasdsaa [mailto:bullet1@left.de] Br2nL<b>Gesendet:</b> Dienstag, 6. August 2013 08:59Br2nL<b>An:</b> Helmut Grashoff (asdffasd@gmbh.de)Br2nL<b>Betreff:</b> erster Test für die 2.2.1 (mit HTML)</span></p>
</div>
</div>
<div style="padding-top:3.0pt;padding-left:0cm;padding-right:0cm;padding-bottom:0cm;border-left-style:none;border-top-width:1.0pt;border-bottom-style:none;border-right-style:none;border-top-color:#B5C4DF;border-top-style:solid;">
<p class="msonormal"> <b> <span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;">Von:</span></b><span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;"> Uasdsaa Aasdsaa [mailto:bullet1@left.de] Br2nL<b>Gesendet:</b> Dienstag, 6. August 2013 08:59Br2nL<b>An:</b> Helmut Grashoff (asdfsad@gmbh.de)Br2nL<b>Betreff:</b> erster Test für die 2.2.1 (mit HTML)</span></p>
</div>
<p class="msonormal"> <b> <span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;">Von:</span></b><span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;"> Uasdsaa Aasdsaa [mailto:bullet1@left.de] Br2nL<b>Gesendet:</b> Dienstag, 6. August 2013 08:59Br2nL<b>An:</b> Helmut Grashoff (asdfsdfa@gmbh.de)Br2nL<b>Betreff:</b> erster Test für die 2.2.1 (mit HTML)</span></p>
<b> <span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;">Von:</span></b>
<span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;">Von:</span>
<span tahoma","sans-serif";mso-fareast-language:de"="" style="font-size:10.0pt;"> Uasdsaa Aasdsaa [mailto:bullet1@left.de] Br2nL<b>Gesendet:</b> Dienstag, 6. August 2013 08:59Br2nL<b>An:</b> Helmut Grashoff (asdfsdf@gmbh.de)Br2nL<b>Betreff:</b> erster Test für die 2.2.1 (mit HTML)</span>
<b>Gesendet:</b>
<b>An:</b>
<b>Betreff:</b>
<p class="msonormal"> </p>
<p class="msonormal">Erst schauen wir mal, ob die Mail überhaupt ankommt.</p>
<p class="msonormal"> </p>
<p class="msonormal">Und <i> <u>gleichzeitig</u></i> <span style="font-size:18.0pt;">spiele</span> ich noch ein <span style="color:#31859C;">wenig </span> <span style="font-family:Algerian;">mit dieser Zeile</span></p>
<i> <u>gleichzeitig</u></i>
<u>gleichzeitig</u>
<span style="font-size:18.0pt;">spiele</span>
<span style="color:#31859C;">wenig </span>
<span style="font-family:Algerian;">mit dieser Zeile</span>
<p class="msonormal"> </p>
<p class="msonormal"> <span mso-fareast-language="DE">Mit freundlichen Gruessen</span></p>
<span mso-fareast-language="DE">Mit freundlichen Gruessen</span>
<p class="msonormal"> <span mso-fareast-language="DE"> </span></p>
<span mso-fareast-language="DE"> </span>
<p class="msonormal"> <span mso-fareast-language="DE" style="color:#31849B;">Uasdsaa Aasdsaa</span></p>
<span mso-fareast-language="DE" style="color:#31849B;">Uasdsaa Aasdsaa</span>
<p class="msonormal"> <span mso-fareast-language="DE" style="font-size:9.0pt;">Iasdsaa-Sasdsaa</span></p>
<span mso-fareast-language="DE" style="font-size:9.0pt;">Iasdsaa-Sasdsaa</span>
<p class="msonormal"> <span mso-fareast-language="DE"> </span></p>
<span mso-fareast-language="DE"> </span>
<p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span></p>
<span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span>
<p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span></p>
<span verdana","sans-serif";mso-fareast-language:de"="">asdsaa</span>
<p class="msonormal"> <span verdana","sans-serif";mso-fareast-language:de"="">56218 Mülheim-Kärlich</span></p>
<span verdana","sans-serif";mso-fareast-language:de"="">56218 Mülheim-Kärlich</span>
<p class="msonormal"> <span mso-fareast-language="DE"> </span></p>
<span mso-fareast-language="DE"> </span>
<p class="msonormal"> <span mso-fareast-language="DE"> </span></p>
<span mso-fareast-language="DE"> </span>
<p class="msonormal"> </p>
我正在使用Java 6和Jsoup 1.7.2。
它的明顯和完美。
實際上,它不會加倍,但會重復Elements
到參考Node
深度的次數
body
之下的每個Node
都被視為包括body
在內的Element
,直到到達葉Node
為止。
如果考慮文本<b>Betreff:</b>
, <b>Betreff:</b>
被重復7次,位於根( <body>
)下方7個級別,並且也是樹中深度更大的子級。
body > div >div > div > p > span >b
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.