[英]Remove all control characters from a Java string
I have a string coming from a UI that contains control characters such as line feeds and carrage returns. 我有一个来自UI的字符串,其中包含控制字符,例如换行符和回车符。
I would like to do something like this: 我想做这样的事情:
String input = uiString.replaceAll(<regex for all control characters> , "")
Surely this has been done before!? 当然这已经做过了!?
Using Guava , probably more efficient than using the full regex engine, and certainly more readable... 使用Guava ,可能比使用完整的正则表达式引擎更有效,当然更具可读性......
return CharMatcher.JAVA_ISO_CONTROL.removeFrom(string);
Alternately, just using regexes, albeit not quite as readably or efficiently... 或者,只使用正则表达式,尽管不是那么可读或有效......
return string.replaceAll("\\p{Cntrl}", "");
像这样的东西应该做的伎俩:
String newString = oldString.replaceAll("[\u0000-\u001f]", "");
To remove just ASCII control characters, use the Cntrl
character class 要仅删除ASCII控制字符,请使用Cntrl
字符类
String newString = string.replaceAll("\\p{Cntrl}", "");
To remove all 65 of the characters that Unicode refers to as "control characters", use the Cntrl
character class in UNICODE_CHARACTER_CLASS
mode, with the (?U)
flag: 要删除Unicode引用的所有65个字符作为“控制字符”,请使用UNICODE_CHARACTER_CLASS
模式中的Cntrl
字符类,并带有(?U)
标志:
String newString = string.replaceAll("(?U)\\p{Cntrl}", "");
To additionally remove unicode "format" characters - things like the control characters for making text go right-to-left, or the soft hyphen - also nuke the Cf
character class: 要另外删除unicode“格式”字符 - 诸如用于使文本从右到左的控制字符或软连字符之类的东西 - 也可以对Cf
字符类进行Cf
:
String newString = string.replaceAll("(?U)\\p{Cntrl}|\\p{Gc=Cf}", "");
不推荐使用Guava CharMatcher.JAVA_ISO_CONTROL,而是使用javaIsoControl() :
CharMatcher.javaIsoControl().removeFrom(string);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.