Jsoup解析器删除带有'<'和'>'的单词

Question

I'm using the Jsoup.parse() to remove html tags from a String. 我正在使用Jsoup.parse()从字符串中删除html标签。 But my string as a word like <name> also. 但是我的字符串也像<name>这样的单词。

The problem is Jsoup.parse() remove that too. 问题是Jsoup.parse（）也将其删除。 I'ts because that text has < and >. 我不是因为该文本具有<和>。 I can't just remove < and > from the text too. 我也不能只从文本中删除<和>。 How can I do this. 我怎样才能做到这一点。

String s1 = Jsoup.parse("<p>Hello World</p>").text();
//s1 is "Hello World". Correct

String s2 = Jsoup.parse("<name>").text();
//s2 is "". But it should be <name> because <name> is not a html tag

Answer 1

I'm using the Jsoup.parse() to remove html tags from a String. 我正在使用Jsoup.parse（）从字符串中删除html标签。

You want to use the Jsoup#clean method. 您要使用Jsoup＃clean方法。 You'll also need a little manual work after because Jsoup will still see <name> as an HTML tag. 之后，您还需要进行一些手动操作，因为Jsoup仍将<name>视为HTML标记。

// Define the list of words to preserve...
String[] myExceptions = new String[] { "name" }; 
int nbExceptions = myExceptions.length;

// Build a whitelist for Jsoup...
Whitelist myWhiteList = Whitelist.simpleText().addTags(myExceptions);

// Let Jsoup remove any html tags...
String s2 = Jsoup.clean("<name>", myWhiteList);

// Complete the initial html tags removal...
for (int i = 0; i < nbExceptions; i++) {
    s2 = s2.replaceAll("<" + myExceptions[i] + ">.+?</" + myExceptions[i] + ">", "<" + myExceptions[i] + ">");
}

System.out.println(">>" + s2);

OUTPUT 输出值

>><name>

Jsoup解析器删除带有'<'和'>'的单词

问题描述

1 个解决方案

解决方案1
-1 2016-08-03 10:00:25

OUTPUT 输出值

References 参考文献

Jsoup解析器删除带有&#39;&lt;&#39;和&#39;&gt;&#39;的单词

问题描述

1 个解决方案

解决方案1 -1 2016-08-03 10:00:25

OUTPUT 输出值

References 参考文献

Jsoup解析器删除带有'<'和'>'的单词

解决方案1
-1 2016-08-03 10:00:25