简体   繁体   English

如何删除 Unicode 字符串 Java

[英]How to remove Unicode String Java

I have written a Springboot app that reads from DynamoDB and generates a XML. In one of the items in the table, there is a field that has the string ' '.我编写了一个 Springboot 应用程序,它从 DynamoDB 读取并生成一个 XML。在表中的一项中,有一个字段包含字符串“”。 This is a unicode string that denotes End Of Medium.这是一个 unicode 字符串,表示 End Of Medium。 Please refer to the screenshot below to see how it looks in DynamoDB.请参考下面的屏幕截图以查看它在 DynamoDB 中的外观。 在此处输入图像描述

The Springboot app reads it. Springboot 应用读取它。 With IntelliJ, I inspected the variable that holds this value.使用 IntelliJ,我检查了保存该值的变量。 It looks like this看起来像这样在此处输入图像描述

When I write this value to an XML, the XML tag looks like this.当我将此值写入 XML 时,XML 标记如下所示。 在此处输入图像描述

There is another program that is trying to parse this XML. It fails complaining还有另一个程序试图解析这个 XML。它失败了

XML character (Unicode: 0x19) at lineNumber: ___ ; columnNumber: ___ ;

I want to check if a string contains a Unicode string.我想检查一个字符串是否包含 Unicode 字符串。 If it contains, I want to remove the same.如果它包含,我想删除它。 I tried using我尝试使用

  • Apache library:StringEscapeUtils.unescapeJava(test2) Apache 库:StringEscapeUtils.unescapeJava(test2)
  • replaceAll("\P{Print}", ""); replaceAll("\P{打印}", "");

The problem with these, is that they remove also characters like é .这些的问题在于,它们还删除了像é这样的字符。 For example: L'Oréal becomes L'Oral OR LOral例如:L'Oréal 变成 L'Oral OR LOral

Any suggestions will be appreciated.任何建议将不胜感激。 Thanks.谢谢。

As @g00se mentioned, the below code removes all not printable characters including \n \r正如@g00se 提到的,下面的代码删除了所有不可打印的字符,包括\n \r

input.replaceAll("\\p{Cntrl}", ""); 

The below code removes, End of Medium unicode string alone.下面的代码单独删除了 End of Medium unicode 字符串。

input.replaceAll(unicodeString, "\u0019");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 HTTP header 发送非英文字符串 unicode? - How to send non-English unicode string using HTTP header? 如何从 BiqQuery 中的字符串中删除多个符号 - How to remove several symbols from a string in BiqQuery 如何从字符串中删除“R$”以将其转换为在 BigQuery 中浮动 - How to remove "R$ " from a string to convert it to float in BigQuery 如何在 BigQuery 中输入 unicode 字符代码? - How to enter in a unicode character code in BigQuery? 如何通过golang删除字符串的HTML元素(按类选择)? - How to remove HTML element (select by class) of string by golang? 如何从 Firebase 存储下载 URL 中删除查询字符串 - How to remove query string from Firebase Storage download URL 如何从数字字符串中删除前导字符? SQL - How to remove a leading character from numeric string? SQL AWS API 网关:如何删除/替换 HTTP 代理直通集成中的查询字符串参数? - AWS API Gateway: How to remove/replace query string parameter in HTTP Proxy Passthrough integration? 如何删除字符串的第一个字符并将其余值视为 BigQuery 中的 integer - How do I remove the first character of a string and treat the remaining values as an integer in BigQuery 如何删除多个标题 - how to remove multiple headers
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM