简体   繁体   中英

How to REPLACE html tags with characters using jsoup in java

I am using a java code to extract information from the web for processing, and I am using the jsoup library to clean the html tags in the responses I get from website. Now in order to extract info from these codes I have to replace the html tags with a rarely used character such as '~'.

So here's my question:

How do I convert this:

<h1>This is heading 1</h1>
<h2>This is heading 2</h2>
<h3>This is heading 3</h3>
<h4>This is heading 4</h4>
<h5>This is heading 5</h5>
<h6>This is heading 6</h6>

Into this:

   ~This is heading 1~
   ~This is heading 2~
   ~This is heading 3~
   ~This is heading 4~
   ~This is heading 5~
   ~This is heading 6~

using jsoup ?

String cssSelector = //add your selector. from the example you include i cant get a proper selector.
Document doc = Jsoup.parse("html")
Elements elms = doc.select(cssSelector)
for(Element elm:elms){
     System.out.println("~" + elm.text() + "~")
}

update

if you want to replace ALL elements you can do this:

html = html.replaceAll("<[^>]*>","~")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM