I am trying to work around to remove symbols and special characters from a raw text in java and could not find way around. The text is taken from a free-text field in a website which may contain literally anything. I am taking this text from an external source and have no control to change setting. So I have to work around at my end. Some examples are
1) belem 🐺 should be--> belem
2) Ariana 👑 should be--> Ariana
3) Harlem 🌊 should be--> Harlem
4) Yz 🏳️🌈 should be--> Yz
5) ここさけは7回は見に行くぞ👍💟 should be--> ここさけは7回は見に行くぞ
6) دمي ازرق وطني ازرق 💙🔵🔵🔵🔵 should be--> دمي ازرق وطني ازرق
Any help please?
You can try this regex that find all emojis in a string :
regex = "[\\ud83c\\udc00-\\ud83c\\udfff]|[\\ud83d\\udc00-\\ud83d\\udfff]|[\\u2600-\\u27ff]"
then remove all the emojis in it using replaceAll()
method:
String text = "ここさけは7回は見に行くぞ👍💟 ";
String regex = "[\\ud83c\\udc00-\\ud83c\\udfff]|[\\ud83d\\udc00-\\ud83d\\udfff]|[\\u2600-\\u27ff]";
System.out.println(text.replaceAll(regex, ""));
Output :
ここさけは7回は見に行くぞ
If you mean "special characters" are surrogate pairs, try this.
static String removeSpecial(String s) {
int[] r = s.codePoints()
.filter(c -> c < Character.MIN_SURROGATE)
.toArray();
return new String(r, 0, r.length);
}
and
String[] testStrs = {
"belem 🐺",
"Ariana 👑",
"Harlem 🌊",
"Yz 🏳️🌈",
"ここさけは7回は見に行くぞ👍💟",
"دمي ازرق وطني ازرق 💙🔵🔵🔵🔵"
};
for (String s : testStrs)
System.out.println(removeSpecial(s));
results
belem
Ariana
Harlem
Yz
ここさけは7回は見に行くぞ
دمي ازرق وطني ازرق
对于空格使用字符类,对“任何语言的任何字母或数字”使用POSIX字符类:
str = str.replaceAll("[^\\s\\p{Alnum}]", "");
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.