简体   繁体   English

如何从Java中删除字符串中的\\ u200B(零长度空白Unicode字符)?

[英]How to remove \u200B (Zero Length Whitespace Unicode Character) from String in Java?

My application is using Spring Integration for email polling from Outlook mailbox. 我的应用程序使用Spring IntegrationOutlook邮箱进行电子邮件轮询。

As, it is receiving the String ( email body )from an external system (Outlook), So I have no control over it. 因为,它从外部系统(Outlook)接收字符串( 电子邮件正文 ),所以我无法控制它。

For Example, 例如,

String emailBodyStr= "rejected by sundar14-\u200B.";

Now I am trying to remove the unicode character \​ from this String. 现在我试图从此String中删除 unicode字符\\ u200B

What I tried already. 我已经尝试过了什么。

Try#1: 尝试#1:

emailBodyStr = emailBodyStr.replaceAll("\u200B", "");

Try#2: 尝试#2:

`emailBodyStr = emailBodyStr.replaceAll("\u200B", "").trim();`

Try#3 (using Apache Commons): 尝试#3 (使用Apache Commons):

StringEscapeUtils.unescapeJava(emailBodyStr);

Try#4: 尝试#4:

StringEscapeUtils.unescapeJava(emailBodyStr).trim();

Nothing worked till now. 到目前为止没有任何工作

When I tried to print this String using below code. 当我尝试使用下面的代码打印此字符串时。

logger.info("Comment BEFORE:{}",emailBodyStr);
logger.info("Comment AFTER :{}",emailBodyStr);

In Eclipse console, it is NOT printing unicode char, Eclipse控制台中,它打印unicode char,

Comment BEFORE:rejected by sundar14-​. 评论之前:被sundar14-拒绝。

But the same code prints the unicode char in Linux console as below. 但是相同的代码在Linux控制台中打印unicode char,如下所示。

Comment BEFORE:rejected by sundar14-\​. 评论之前:被sundar14- \\ u200B拒绝。

I read some examples where str.replace() is recommended, but please note that examples uses javascript, PHP and not Java. 我读了一些推荐str.replace()的例子,但请注意,例子使用的是javascript,PHP不是 Java。

Finally, I am able to remove ' Zero Width Space ' character by using ' Unicode Regex '. 最后,我可以使用' Unicode Regex '删除' 零宽度空间 '字符。

String plainEmailBody = new String();
plainEmailBody = emailBodyStr.replaceAll("[\\p{Cf}]", "");

Reference to find the category of Unicode characters. 参考以查找Unicode字符的类别。

  1. Character class from Java . 来自Java的字符类。

Character class from Java lists all of these unicode categories. Java中的Character类列出了所有这些unicode类别。

在此输入图像描述

  1. Website: http://www.fileformat.info/ 网站: http //www.fileformat.info/

人物类别

  1. Website: http://www.regular-expressions.info/ => Unicode Regular Expressions 网站: http //www.regular-expressions.info/ => Unicode正则表达式

用于\\ u200B字符的Unicode正则表达式

Note 1: As I received this string from Outlook Email Body - none of the approaches listed in my question was working. 注意1:当我从Outlook电子邮件正文中收到此字符串时 - 我的问题中列出的方法都没有奏效。

My application is receiving a String from an external system ( Outlook ), So I have no control over it. 我的应用程序从外部系统( Outlook )接收字符串,所以我无法控制它。

Note 2: This SO answer helped me to know about Unicode Regular Expressions . 注2:这个SO 答案帮助我了解了Unicode正则表达式

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM