简体   繁体   English

从字符串中消除Unicode字符和转义字符

[英]Eliminating Unicode Characters and Escape Characters from String

I want to remove all Unicode Characters and Escape Characters like (\\n, \\t) etc. In short I want just alphanumeric string. 我想删除所有的Unicode字符和转义字符,例如(\\n, \\t)等。总之,我只需要字母数字字符串。

For example : 例如 :

\
My Actual String\
 \\nMy Actual String\\n

I want to fetch just 'My Actual String' . 我只想获取'My Actual String' Is there any way to do so, either by using a built in string method or a Regular Expression ? 有没有办法通过使用内置字符串方法或正则表达式来做到这一点?

Try this: 尝试这个:

anyString = anyString.replaceAll("\\\\u\\d{4}|\\\\.", "");

to remove escaped characters. 删除转义字符。 If you also want to remove all other special characters use this one: 如果您还想删除所有其他特殊字符,请使用此字符:

anyString = anyString.replaceAll("\\\\u\\d{4}|\\\\.|[^a-zA-Z0-9\\s]", "");

(I guess you want to keep the whitespaces, if not remove \\\\s from the one above) (我想您要保留空格,如果不从上面的空格中删除\\\\s的话)

Try 尝试

String  stg = "\u2029My Actual String\u2029 \nMy Actual String";
Pattern pat = Pattern.compile("(?!(\\\\(u|U)\\w{4}|\\s))(\\w)+");
Matcher mat = pat.matcher(stg);
String out  =  "";
while(mat.find()){
    out+=mat.group()+" ";   
}
System.out.println(out);

The regex matches all things except unicode and escape characters. 正则表达式匹配所有字符,除了unicode和转义字符。 The regex pictorially represented as: 正则表达式的图形表示为:

在此处输入图片说明

Output : 输出

My Actual String My Actual String

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM