简体   繁体   中英

jsoup: removing iframe tags

I am using jsoup 1.6.1 and facing the problem when I try to remove iframe tag from html. When iframe do not have any body(ie <iframe pro=value />), the remove() method removes all the contents after thet tag. Here is my sample code.

String html ="&lt;p> This is start.&lt;/p>&lt;iframe frameborder="0" marginheight="0" />&lt;p> This is end&lt;/p>";
Document doc = Jsoup.parse(html,"UTF-8");<br>
doc.select("iframe").remove();<br>
System.out.println(doc.text());

It returns to me -

This is start.

But I am expecting the result -

This is start. This is end

Thanks in advance

It appears the closing tag for iframe is required. You can't use a self closing tag:

http://msdn.microsoft.com/en-us/library/ie/ms535258(v=vs.85).aspx http://stackoverflow.com/questions/923328/line-after-iframe-is-not-visible http://www.w3resource.com/html/iframe/HTML-iframe-tag-and-element.php

So, Jsoup is following the spec and taking whatever follows the iframe tag and using that as its body. When you remove the iframe, "This is the end" gets removed along with it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM