如何使用 java 从 html 中删除“”字符串？

Question

我想删除" " 来自 html 标签中的字符串的字符串。我正在使用类似的东西

nonhtml = nonhtml.replaceAll("&#160;", "");

但是在这种方法中，我必须专门删除其他人也喜欢&nbsp 。我不想在这里使用 jsoup ，因为我知道我正在处理什么以及所有 html 标签。任何关于这方面的线索都会有所帮助。

Answer 1

以下代码非常简单 Java，看起来（对我来说）您提供的解决方案对我来说效果很好。

或许还记得：

String.replace(String text, String replaceText)
String.replaceAll(String regularExpression, String replaceText)

……略有不同。 The latter (the one cited in the question), accepts a "Regular-Expression" - which is compiled by the class java.lang.String Look closely at the java documentation pages for class String to see where a Regular-Expression, not a简单字符串用作“替换字符串”方法中的参数...（Google JDK Java-Docs for java.lang.String）

对我来说，这段代码工作得很好，但是，如果replaceAll方法的第一个参数被更改，并且 chabge 包含任何需要由 Reg-Ex 提供的 escaping 的字符，那么该方法将失败。 对于这个简单的案例，这很好。 请参阅： replaceAll(...)和replace(...)

String nonhtml = "<HTML>\n<BODY>\n<H1>Simple Test</H1>\n&nbsp;&nbsp;Hello,<BR />\n<PRE> how are you? </PRE>\n&#160;&#160;</BODY></HTML>\n";
System.out.println(nonhtml);
System.out.println(nonhtml.replaceAll("&nbsp;", "").replaceAll("&#160;", ""));

生产了这个output：

<HTML>
<BODY>
<H1>Simple Test</H1>
&nbsp;&nbsp;Hello,<BR />
<PRE> how are you? </PRE>
&#160;&#160;</BODY></HTML>

<HTML>
<BODY>
<H1>Simple Test</H1>
Hello,<BR />
<PRE> how are you? </PRE>
</BODY></HTML>

如何使用 java 从 html 中删除“”字符串？

问题描述

1 个解决方案

解决方案1
0

如何使用 java 从 html 中删除“”字符串？

问题描述

1 个解决方案

解决方案1 0

解决方案1
0