简体   繁体   English

如何使用JAXB从XML写入和读取特殊字符和符号

[英]How to write and read special characters and symbols from XML using JAXB

Have an JavaRCP application that uses JAXB to generate an XML file, it basically takes input (special characters as well) from textbox to save in xml and display the same by unmarshalling from xml. 有一个使用JAXB生成XML文件的JavaRCP应用程序,它基本上从文本框中获取输入(也包括特殊字符)以保存在xml中,并通过从xml解组将其显示出来。

User is copying console output (may contain special characters) and pasting in the textbox and saving it into an xml. 用户正在复制控制台输出(可能包含特殊字符)并粘贴在文本框中,然后将其保存到xml中。

xml version="1.0" encoding="UTF-8"

jaxb version is 2.1.10 in JDK 1.6_21.

When unmarshalling, receiving an unmarshall exception: 解组时,收到解组异常:

[org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x1b) was found in the element content of the document]

There is an invalid XML character found when unmarshalling the xml. 解组xml时发现无效的XML字符。 I searched this forum for some help and found few links, but neither of them has a resolution or workaround. 我在该论坛上搜索了一些帮助,但发现链接很少,但是它们都没有解决方案或解决方法。 Can anyone guide me. 谁能指导我。

I have tried with other encoding types, but with no success. 我尝试了其他编码类型,但没有成功。 Do I need to replace that character with its equivalent character code before saving/marshalling? 保存/编组之前,是否需要用等效的字符代码替换该字符?

Following are the links which are closer to my problem: Saving an escape character 0x1b in an XML file Invalid Characters in XML 以下是更接近我的问题的链接: 在XML文件中保存转义字符0x1b XML中的 无效字符

A JAXB bug report describing this problem was closed with the following explanation: 使用以下说明关闭了描述此问题的JAXB错误报告

Sorry, this is simply a restriction in XML. 抱歉,这只是XML中的限制。

In XML, control characters are not allowed. 在XML中,不允许使用控制字符。 See the list of allowed characters at http://www.w3.org/TR/REC-xml/#NT-Char 请参阅http://www.w3.org/TR/REC-xml/#NT-Char允许的字符列表

This is not a matter of escaping http://www.w3.org/TR/REC-xml/#sec-references . 这与转义http://www.w3.org/TR/REC-xml/#sec-references无关。 Those characters like \ is simply not a valid character to have in XML. 像\\ u001C这样的字符根本不是XML中有效的字符。 There's no way to transfer strings that contain those characters. 无法传输包含这些字符的字符串。

Your option is either to come up with your own string encoding scheme to make your string "XML-safe", or use binary encoding such as base64. 您的选择是提出自己的字符串编码方案以使字符串“ XML安全”,或使用二进制编码(例如base64)。

So, there is absolutely no way to represent these characters in XML. 因此,绝对没有办法用XML表示这些字符。 If exact representation of these strings is not critical for your application you can just remove these characters or replace them with some placeholders, otherwise you have to encode these strings using some safe encoding scheme such as Base64. 如果这些字符串的确切表示形式对您的应用程序并不重要,则可以删除这些字符或将它们替换为某些占位符,否则必须使用一些安全的编码方案(例如Base64)对这些字符串进行编码。

Yup you don't want to remove CONTROL CHAR, you can escape the char. 是的,您不想删除CONTROL CHAR,则可以转义该char。
You can use java.net.URLEncoder to encode your data at server side and then decode it at client side using java.net.URLDecoder. 您可以使用java.net.URLEncoder在服务器端对数据进行编码,然后使用java.net.URLDecoder在客户端将其解码。
It works like charm, I have used it for same purpose and working fine. 它的工作原理就像魅力,我已经将它用于相同的目的并且工作良好。

If you replace 0x1b with ? 如果将0x1b替换为? manually in code, other day you will find some other CONTROL CHAR. 手动输入代码,前一天您会发现其他一些控制字符。 So I think better way is to use Encoder/Decoder if you want to preserve data otherwise remote it. 因此,我认为更好的方法是如果要保留数据,则使用编码器/解码器,否则将其远程存储。

You can refer my question here: Illegal character - CTRL-CHAR 您可以在此处引用我的问题: 非法字符-CTRL-CHAR

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM