简体   繁体   English

从 Java 中的字符串中删除不可打印的字符

[英]Remove non printable character from a string in Java

I have string as below:我有如下字符串:

String s = "$$$$A very beautiful girl having loads of £££££ in her 20�s.";

I went through some StackOverflow responses and tried below:我经历了一些 StackOverflow 响应并尝试了以下内容:

s.replaceAll("[^\\x00-\\x7F]", " ");

s.replaceAll("[^\\p{ASCII}]", " ");

Both of them are removing the wierd question mark , but they are also removing the pound(£) sign retaining the dollar($) sign.他们都删除了奇怪的问号,但他们也删除了保留美元 ($) 符号的英镑 (£) 符号。 I need to retain the currency symbol.我需要保留货币符号。 Can you suggest a more suitable approach?你能提出更合适的方法吗?

Also, is there any other library to do this rather than using the regex?另外,有没有其他库可以做到这一点而不是使用正则表达式?

Try using:尝试使用:

s.replaceAll("[^\\x00-\\xFF]", " ");

Your problem is, pound sign is a part of Latin-1 Supplement Unicode block, which is not included when you filter upto 7F .您的问题是,井号是Latin-1 Supplement Unicode 块的一部分,当您过滤到7F时不包括在内。

To efficiently remove all unprintable characters from a string, including the often-overlooked Unicode control codes that have been exploited by hackers:要有效地从字符串中删除所有不可打印的字符,包括经常被黑客利用的 Unicode 控制代码:

    String broken = "\r\nhello world\b\u200E\uDB80";

    StringBuilder fixed = broken.codePoints()
        .filter(c -> {
          switch (Character.getType(c)) {
            case Character.CONTROL:
            case Character.FORMAT:
            case Character.PRIVATE_USE:
            case Character.SURROGATE:
            case Character.UNASSIGNED:
              return false;
            default:
              return true;
          }
        })
        .collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append);

    assertEquals("hello world", fixed.toString());

If you want to remove other character classes then simply include them in the case statement.如果要删除其他字符类,只需将它们包含在 case 语句中即可。 This implements a blacklist .这实现了一个黑名单 If you prefer a whitelist then you can invert the logic to return true when a character is an acceptable type and return false for all others.如果你喜欢一个白名单,那么你可以反转的逻辑返回true当一个字符是一个可以接受的类型和返回false的所有其他人。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM