简体   繁体   中英

Clean source code parse website with jsoup

When analyzed by java jsoup site, I want to remove the /> in each img tag.

Source:

<div>
    <a href="#">ABC</a> 
    <a href="#"><img src="#"/></a>
    <br/>
</div>

Result:

<div>
    <a href="#">ABC</a> 
    <a href="#"><img src="#"></a>
    <br/>
</div>

Try html() method

public class Test {
    public static void main(String[] args) {
        String s="<div>\n" +
                "    <a href=\"#\">ABC</a> \n" +
                "    <a href=\"#\"><img src=\"#\"/></a>\n" +
                "    <br/>\n" +
                "</div>";
        System.out.println(Jsoup.parse(s).html());
    }
}

Output:

<html>
 <head></head>
 <body>
  <div> 
   <a href="#">ABC</a> 
   <a href="#"><img src="#"></a> 
   <br> 
  </div>
 </body>
</html>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM