I have a regex expression which removes all non alphanumeric characters. It is working fine for all special characters apart from ^. Below is the regex expression I am using.
String strRefernce = strReference.replaceAll("[^\\p{IsAlphabetic}^\\p{IsDigit}]", "").toUpperCase();
I tried modifying it to
String strRefernce = strReference.replaceAll("[^\\p{IsAlphabetic}^\\p{IsDigit}]\\^", "").toUpperCase();
and
String strRefernce = strReference.replaceAll("[^\\p{IsAlphabetic}^\\p{IsDigit}\\^]", "").toUpperCase();
But these are also not able to remove this symbol. Can someone please help me with this.
The first ^
inside [^...]
is a negation mark making the character class a negated one (matching characters other than what is inside).
The second one inside is considered a literal - thus, it should not be matched with the regex. Remove it, and a caret will get matched with it:
"[^\\p{IsAlphabetic}\\p{IsDigit}]"
or even shorter:
"(?U)\\P{Alnum}"
The \\P{Alnum}
class stands for any character other than an alphanumeric character: [\\p{Alpha}\\p{Digit}]
(see Java regex reference ). When you pass (?U)
, the \\P{Alnum}
class will not match Unicode letters. See this IDEONE demo .
Add a +
at the end if you want to remove whole chunks of symbols other than \\\\p{IsAlphabetic}
and \\\\p{IsDigit}
.
This works as well.
System.out.println("Text 尖酸[刻薄 ^, More _0As text °ÑÑ"".replaceAll("(?U)[^[\\W_]]+", " "));
Output
Text 尖酸 刻薄 More 0As text Ñ Ñ
Not sure but the word might be the more comprehensive list of alphanum characters.
[\\\\W_]
is a class containing non-words and an underscore.
When put into a negative Java class construct it becomes
[^[\\\\W_]]
is a negative class of a union between nothing and
a class containing non-words and an underscore.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.