简体   繁体   中英

How can I remove those html elements, while retain the formatting?

I have try to implement the java mail api to read body of the message and store it into text file if it contains contents.

I can able to read the body of the message but it comes with some html elements.

I have added below code in which I have used.

Properties props = System.getProperties();
    props.setProperty("mail.store.protocol", "imaps");

    Session session = Session.getDefaultInstance(props, null);
    Store store = session.getStore("imaps");
    store.connect("hostname", "username", "password");
    String result = null;
    Folder inbox = store.getFolder("Inbox");
    inbox.open(Folder.READ_ONLY);
    javax.mail.Message messages[]=inbox.search(new FlagTerm(new Flags(Flag.SEEN), false));
    for(Message message:messages) {
        System.out.println(Jsoup.parse(message).text());
    }

How can I remove those html elements in retrieved message?

Please anyone help me to solve this.

To remove all HTML tags in your mail use the jsoups text() method.

Example Code

String htmlString = "<div class=\"WordSection1\"> <p class=\"MsoNormal\">Hi<br> <br> <br> <br> Data is written in this mail.<br> <br> <br> <br> <o:p></o:p></p> </div>";

System.out.println(Jsoup.parse(htmlString).text());

Output

Hi Data is written in this mail.

If specific elements should be result in line-breaks similar to the rendered HTML source, you could add line-breaks and then avoid pretty printing it, when you jsoups' clean method .

prettyPrint

If disabled, the HTML output methods will not re-format the output, and the output will generally look like the input.

Example Code

String htmlString = "<div class=\"WordSection1\"> <p class=\"MsoNormal\">Hi<br> <br> <br> <br> Data is written in this mail.<br> <br> <br> <br> <o:p></o:p></p> </div>";

htmlString = htmlString.replaceAll("<br>", System.getProperty("line.separator") + "<br>"); // do replacements for all tags that should result in line-breaks

Document.OutputSettings settings = new OutputSettings();
settings.prettyPrint(false); // to keep line-breaks

String cleanedSource = Jsoup.clean(htmlString, "", Whitelist.none(), settings);

System.out.println(cleanedSource);

Output

 Hi



 Data is written in this mail.
[... four more empty lines]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM