简体   繁体   中英

How do I use Rhino to remove <scripts> tag?

I have an HTML email message that I parse using Jsoup :-

Jsoup.parse(bizmsg.getMessageBody()).text()

But it can't remove script tags :-

<script>
document.write("Bazinga!")
</script>

I have been using regex like this :-

String(v).replace(/(?:<script.*?>)((\n|\r|.)*?)(?:<\/script>)/ig, "");

to successfully remove scripts. But I came across this question JSoup to parse <script> tag

How do I use Rhino to parse scripts ? Code-Sample would be very helpful, thanks.

You don't need to use Rhino to remove <script> tags. Use simple CSS selectors in JSoup and remove the obtained nodes. Here a minimal example on www.google.com

public static void main(String[] args) throws MalformedURLException, IOException {
    Document doc = Jsoup.parse(new URL("http://www.google.com"),5000);
    Elements elems = doc.select("script");
    for (Element elem : elems)
        elem.remove();
    System.out.println(doc);

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM