简体   繁体   English

从Java字符串中删除所有控制字符

[英]Remove all control characters from a Java string

I have a string coming from a UI that contains control characters such as line feeds and carrage returns. 我有一个来自UI的字符串,其中包含控制字符,例如换行符和回车符。

I would like to do something like this: 我想做这样的事情:

String input = uiString.replaceAll(<regex for all control characters> , "")

Surely this has been done before!? 当然这已经做过了!?

Using Guava , probably more efficient than using the full regex engine, and certainly more readable... 使用Guava ,可能比使用完整的正则表达式引擎更有效,当然更具可读性......

return CharMatcher.JAVA_ISO_CONTROL.removeFrom(string);

Alternately, just using regexes, albeit not quite as readably or efficiently... 或者,只使用正则表达式,尽管不是那么可读或有效......

return string.replaceAll("\\p{Cntrl}", "");

像这样的东西应该做的伎俩:

String newString = oldString.replaceAll("[\u0000-\u001f]", "");

To remove just ASCII control characters, use the Cntrl character class 要仅删除ASCII控制字符,请使用Cntrl字符类

String newString = string.replaceAll("\\p{Cntrl}", "");

To remove all 65 of the characters that Unicode refers to as "control characters", use the Cntrl character class in UNICODE_CHARACTER_CLASS mode, with the (?U) flag: 要删除Unicode引用的所有65个字符作为“控制字符”,请使用UNICODE_CHARACTER_CLASS模式中的Cntrl字符类,并带有(?U)标志:

String newString = string.replaceAll("(?U)\\p{Cntrl}", "");

To additionally remove unicode "format" characters - things like the control characters for making text go right-to-left, or the soft hyphen - also nuke the Cf character class: 要另外删除unicode“格式”字符 - 诸如用于使文本从右到左的控制字符或软连字符之类的东西 - 也可以对Cf字符类进行Cf

String newString = string.replaceAll("(?U)\\p{Cntrl}|\\p{Gc=Cf}", "");

不推荐使用Guava CharMatcher.JAVA_ISO_CONTROL,而是使用javaIsoControl()

CharMatcher.javaIsoControl().removeFrom(string);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM