[英]Jsoup parser remove words with '<' and '>'
I'm using the Jsoup.parse()
to remove html tags from a String. 我正在使用
Jsoup.parse()
从字符串中删除html标签。 But my string as a word like <name>
also. 但是我的字符串也像
<name>
这样的单词。
The problem is Jsoup.parse() remove that too. 问题是Jsoup.parse()也将其删除。 I'ts because that text has < and >.
我不是因为该文本具有<和>。 I can't just remove < and > from the text too.
我也不能只从文本中删除<和>。 How can I do this.
我怎样才能做到这一点。
String s1 = Jsoup.parse("<p>Hello World</p>").text();
//s1 is "Hello World". Correct
String s2 = Jsoup.parse("<name>").text();
//s2 is "". But it should be <name> because <name> is not a html tag
I'm using the Jsoup.parse() to remove html tags from a String.
我正在使用Jsoup.parse()从字符串中删除html标签。
You want to use the Jsoup#clean method. 您要使用Jsoup#clean方法。 You'll also need a little manual work after because Jsoup will still see
<name>
as an HTML tag. 之后,您还需要进行一些手动操作,因为Jsoup仍将
<name>
视为HTML标记。
// Define the list of words to preserve...
String[] myExceptions = new String[] { "name" };
int nbExceptions = myExceptions.length;
// Build a whitelist for Jsoup...
Whitelist myWhiteList = Whitelist.simpleText().addTags(myExceptions);
// Let Jsoup remove any html tags...
String s2 = Jsoup.clean("<name>", myWhiteList);
// Complete the initial html tags removal...
for (int i = 0; i < nbExceptions; i++) {
s2 = s2.replaceAll("<" + myExceptions[i] + ">.+?</" + myExceptions[i] + ">", "<" + myExceptions[i] + ">");
}
System.out.println(">>" + s2);
>><name>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.