简体   繁体   中英

How to remove certain html tags from a String with replaceAll?

I have a string including different kinds of html tags.

I want to remove all <a> and </a> tags.

I tried:

string.replaceAll("<a>", "");
string.replaceAll("</a>", "");

But it doesn't work. Those tags still remain in the string. Why?

Those tags still remain in the string. Why?

Because replaceAll doesn't modify the string directly (it can't, strings are immutable), it returns the modified string. So:

string = string.replaceAll("<a>", "");
string = string.replaceAll("</a>", "")

Live Example

Or

string = string.replaceAll("<a>", "").replaceAll("</a>", "")

Note that replaceAll takes a string defining a regular expression as its first argument. "<a>" and "</a>" are both fine, but unless you need to use a regular expression, use replace(CharSequence,CharSequence) instead. If using replaceAll , just be aware of the characters with special meaning in regular expressions.

In fact, you can do it with one replaceAll by making use of the fact you're using regular expressions:

string = string.replaceAll("</?a>", "");

The ? after the / makes the / optional, so that'll replace "<a>" and "</a>" .

Live Example

replaceAll("\\<\\w*\\>", "\\ ").replaceAll("\\", "\\ "); remove all tags html XD , 2 "\\"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM