简体   繁体   English

如何使用 java 从 html 中删除“”字符串?

[英]how to remove “ ” string from html using java?

I wanted to remove " "我想删除" " string from a string which is in a html tag.I am using something like来自 html 标签中的字符串的字符串。我正在使用类似的东西

nonhtml = nonhtml.replaceAll(" ", "");

But in this approach I have to specifically remove others also like &nbsp .I donot want to use jsoup here as i know what and all html tag I am dealing with.Any clue on this would be helpful.但是在这种方法中,我必须专门删除其他人也喜欢&nbsp 。我不想在这里使用 jsoup ,因为我知道我正在处理什么以及所有 html 标签。任何关于这方面的线索都会有所帮助。

The following code is pretty simple Java, and it looks like (to me) that the solution you have provided works just fine on my end.以下代码非常简单 Java,看起来(对我来说)您提供的解决方案对我来说效果很好。

Perhaps remember:或许还记得:

  • String.replace(String text, String replaceText)
  • String.replaceAll(String regularExpression, String replaceText)

... are slightly different. ……略有不同。 The latter (the one cited in the question), accepts a "Regular-Expression" - which is compiled by the class java.lang.String Look closely at the java documentation pages for class String to see where a Regular-Expression, not a simple String is used as a parameter in "Replace-String" methods... (Google JDK Java-Docs for java.lang.String) The latter (the one cited in the question), accepts a "Regular-Expression" - which is compiled by the class java.lang.String Look closely at the java documentation pages for class String to see where a Regular-Expression, not a简单字符串用作“替换字符串”方法中的参数...(Google JDK Java-Docs for java.lang.String)

This code, for me, works just fine, However, if the first parameter to the replaceAll method were changed, and the chabge included any characters that require escaping courtesy of Reg-Ex, then the method would fail.对我来说,这段代码工作得很好,但是,如果replaceAll方法的第一个参数被更改,并且 chabge 包含任何需要由 Reg-Ex 提供的 escaping 的字符,那么该方法将失败。 For this simple case, it was fine.对于这个简单的案例,这很好。 See: replaceAll(...) and replace(...)请参阅: replaceAll(...)replace(...)

String nonhtml = "<HTML>\n<BODY>\n<H1>Simple Test</H1>\n&nbsp;&nbsp;Hello,<BR />\n<PRE> how are you? </PRE>\n&#160;&#160;</BODY></HTML>\n";
System.out.println(nonhtml);
System.out.println(nonhtml.replaceAll("&nbsp;", "").replaceAll("&#160;", ""));

Produced this output:生产了这个output:

<HTML>
<BODY>
<H1>Simple Test</H1>
&nbsp;&nbsp;Hello,<BR />
<PRE> how are you? </PRE>
&#160;&#160;</BODY></HTML>

<HTML>
<BODY>
<H1>Simple Test</H1>
Hello,<BR />
<PRE> how are you? </PRE>
</BODY></HTML>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM