简体   繁体   中英

how to remove “ ” string from html using java?

I wanted to remove " " string from a string which is in a html tag.I am using something like

nonhtml = nonhtml.replaceAll(" ", "");

But in this approach I have to specifically remove others also like &nbsp .I donot want to use jsoup here as i know what and all html tag I am dealing with.Any clue on this would be helpful.

The following code is pretty simple Java, and it looks like (to me) that the solution you have provided works just fine on my end.

Perhaps remember:

  • String.replace(String text, String replaceText)
  • String.replaceAll(String regularExpression, String replaceText)

... are slightly different. The latter (the one cited in the question), accepts a "Regular-Expression" - which is compiled by the class java.lang.String Look closely at the java documentation pages for class String to see where a Regular-Expression, not a simple String is used as a parameter in "Replace-String" methods... (Google JDK Java-Docs for java.lang.String)

This code, for me, works just fine, However, if the first parameter to the replaceAll method were changed, and the chabge included any characters that require escaping courtesy of Reg-Ex, then the method would fail. For this simple case, it was fine. See: replaceAll(...) and replace(...)

String nonhtml = "<HTML>\n<BODY>\n<H1>Simple Test</H1>\n&nbsp;&nbsp;Hello,<BR />\n<PRE> how are you? </PRE>\n&#160;&#160;</BODY></HTML>\n";
System.out.println(nonhtml);
System.out.println(nonhtml.replaceAll("&nbsp;", "").replaceAll("&#160;", ""));

Produced this output:

<HTML>
<BODY>
<H1>Simple Test</H1>
&nbsp;&nbsp;Hello,<BR />
<PRE> how are you? </PRE>
&#160;&#160;</BODY></HTML>

<HTML>
<BODY>
<H1>Simple Test</H1>
Hello,<BR />
<PRE> how are you? </PRE>
</BODY></HTML>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM