简体   繁体   English

使用正则表达式删除Java中的转义unicode字符串

[英]Remove escaped unicode string in java with regex

I have string like below 我有下面的字符串

"them coming \nLove it \ud83d\ude00"

I want to remove this character "\?\?" . 我要删除此字符"\?\?" so it will be 所以会的

"them coming \nLove it "

How can I achieve this in java? 如何在Java中实现呢? I have tried with code like below but it won't works 我已经尝试过使用下面的代码,但无法正常工作

payload.toString().replaceAll("\\\\u\\b{4}.", "")

Thanks :) 谢谢 :)

I think \\\\\\\\u\\\\b{4}. 我认为\\\\\\\\u\\\\b{4}. will not work, because regex treat \? as a symbol , not a literal string. 将不起作用,因为正则表达式将\?视为符号......,而不是文字字符串。 So to match this kind unwanted (for any reason) unicode characters it will be better to exclude character you accept(don't want to replace), so for ecample all ASCII character, and match everything else (what you want to replace). 因此,要匹配这种不需要的(出于任何原因)unicode字符,最好排除您接受(不想替换)的字符,因此,对于所有ASCII字符,请与其他所有字符(要替换的字符)匹配。 Try with: 尝试:

[^\x00-\x7F]+

The \\x00-\\x7F includes Unicode Basic Latin block . \\x00-\\x7F包含Unicode Basic Latin块

String str = "them coming \nLove it \ud83d\ude00";
System.out.println(str.replaceAll("[^\\x00-\\x7F]+", ""));

will result with: 结果将是:

them coming 他们来了
Love it 爱它

However, you willl hava a problem, if you use national character, any other non-ASCII symbols (ś,ą,♉,☹,etc.). 但是,如果使用国家字符,其他任何非ASCII符号(ś,±,♉,☹等),您都会遇到问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM