[英]Remove non printable character from a string in Java
I have string as below:我有如下字符串:
String s = "$$$$A very beautiful girl having loads of £££££ in her 20�s.";
I went through some StackOverflow responses and tried below:我经历了一些 StackOverflow 响应并尝试了以下内容:
s.replaceAll("[^\\x00-\\x7F]", " ");
s.replaceAll("[^\\p{ASCII}]", " ");
Both of them are removing the wierd question mark , but they are also removing the pound(£) sign retaining the dollar($) sign.他们都删除了奇怪的问号,但他们也删除了保留美元 ($) 符号的英镑 (£) 符号。 I need to retain the currency symbol.
我需要保留货币符号。 Can you suggest a more suitable approach?
你能提出更合适的方法吗?
Also, is there any other library to do this rather than using the regex?另外,有没有其他库可以做到这一点而不是使用正则表达式?
Try using:尝试使用:
s.replaceAll("[^\\x00-\\xFF]", " ");
Your problem is, pound sign is a part of Latin-1 Supplement
Unicode block, which is not included when you filter upto 7F
.您的问题是,井号是
Latin-1 Supplement
Unicode 块的一部分,当您过滤到7F
时不包括在内。
To efficiently remove all unprintable characters from a string, including the often-overlooked Unicode control codes that have been exploited by hackers:要有效地从字符串中删除所有不可打印的字符,包括经常被黑客利用的 Unicode 控制代码:
String broken = "\r\nhello world\b\u200E\uDB80";
StringBuilder fixed = broken.codePoints()
.filter(c -> {
switch (Character.getType(c)) {
case Character.CONTROL:
case Character.FORMAT:
case Character.PRIVATE_USE:
case Character.SURROGATE:
case Character.UNASSIGNED:
return false;
default:
return true;
}
})
.collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append);
assertEquals("hello world", fixed.toString());
If you want to remove other character classes then simply include them in the case statement.如果要删除其他字符类,只需将它们包含在 case 语句中即可。 This implements a blacklist .
这实现了一个黑名单。 If you prefer a whitelist then you can invert the logic to return
true
when a character is an acceptable type and return false
for all others.如果你喜欢一个白名单,那么你可以反转的逻辑返回
true
当一个字符是一个可以接受的类型和返回false
的所有其他人。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.