简体   繁体   中英

Eliminating Unicode Characters and Escape Characters from String

I want to remove all Unicode Characters and Escape Characters like (\\n, \\t) etc. In short I want just alphanumeric string.

For example :

\
My Actual String\
 \\nMy Actual String\\n

I want to fetch just 'My Actual String' . Is there any way to do so, either by using a built in string method or a Regular Expression ?

Try this:

anyString = anyString.replaceAll("\\\\u\\d{4}|\\\\.", "");

to remove escaped characters. If you also want to remove all other special characters use this one:

anyString = anyString.replaceAll("\\\\u\\d{4}|\\\\.|[^a-zA-Z0-9\\s]", "");

(I guess you want to keep the whitespaces, if not remove \\\\s from the one above)

Try

String  stg = "\u2029My Actual String\u2029 \nMy Actual String";
Pattern pat = Pattern.compile("(?!(\\\\(u|U)\\w{4}|\\s))(\\w)+");
Matcher mat = pat.matcher(stg);
String out  =  "";
while(mat.find()){
    out+=mat.group()+" ";   
}
System.out.println(out);

The regex matches all things except unicode and escape characters. The regex pictorially represented as:

在此处输入图片说明

Output :

My Actual String My Actual String

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM