简体   繁体   English

如何在Java中使用jsoup替换带有字符的html标签

[英]How to REPLACE html tags with characters using jsoup in java

I am using a java code to extract information from the web for processing, and I am using the jsoup library to clean the html tags in the responses I get from website. 我正在使用Java代码从Web上提取信息进行处理,并且正在使用jsoup库清理从网站获得的响应中的html标签。 Now in order to extract info from these codes I have to replace the html tags with a rarely used character such as '~'. 现在,为了从这些代码中提取信息,我必须用很少使用的字符(例如“〜”)替换html标签。

So here's my question: 所以这是我的问题:

How do I convert this: 我该如何转换:

<h1>This is heading 1</h1>
<h2>This is heading 2</h2>
<h3>This is heading 3</h3>
<h4>This is heading 4</h4>
<h5>This is heading 5</h5>
<h6>This is heading 6</h6>

Into this: 变成这个:

   ~This is heading 1~
   ~This is heading 2~
   ~This is heading 3~
   ~This is heading 4~
   ~This is heading 5~
   ~This is heading 6~

using jsoup ? 使用jsoup吗?

String cssSelector = //add your selector. from the example you include i cant get a proper selector.
Document doc = Jsoup.parse("html")
Elements elms = doc.select(cssSelector)
for(Element elm:elms){
     System.out.println("~" + elm.text() + "~")
}

update 更新

if you want to replace ALL elements you can do this: 如果要替换所有元素,可以执行以下操作:

html = html.replaceAll("<[^>]*>","~")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM