简体   繁体   中英

How to write and read special characters and symbols from XML using JAXB

Have an JavaRCP application that uses JAXB to generate an XML file, it basically takes input (special characters as well) from textbox to save in xml and display the same by unmarshalling from xml.

User is copying console output (may contain special characters) and pasting in the textbox and saving it into an xml.

xml version="1.0" encoding="UTF-8"

jaxb version is 2.1.10 in JDK 1.6_21.

When unmarshalling, receiving an unmarshall exception:

[org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x1b) was found in the element content of the document]

There is an invalid XML character found when unmarshalling the xml. I searched this forum for some help and found few links, but neither of them has a resolution or workaround. Can anyone guide me.

I have tried with other encoding types, but with no success. Do I need to replace that character with its equivalent character code before saving/marshalling?

Following are the links which are closer to my problem: Saving an escape character 0x1b in an XML file Invalid Characters in XML

A JAXB bug report describing this problem was closed with the following explanation:

Sorry, this is simply a restriction in XML.

In XML, control characters are not allowed. See the list of allowed characters at http://www.w3.org/TR/REC-xml/#NT-Char

This is not a matter of escaping http://www.w3.org/TR/REC-xml/#sec-references . Those characters like \ is simply not a valid character to have in XML. There's no way to transfer strings that contain those characters.

Your option is either to come up with your own string encoding scheme to make your string "XML-safe", or use binary encoding such as base64.

So, there is absolutely no way to represent these characters in XML. If exact representation of these strings is not critical for your application you can just remove these characters or replace them with some placeholders, otherwise you have to encode these strings using some safe encoding scheme such as Base64.

Yup you don't want to remove CONTROL CHAR, you can escape the char.
You can use java.net.URLEncoder to encode your data at server side and then decode it at client side using java.net.URLDecoder.
It works like charm, I have used it for same purpose and working fine.

If you replace 0x1b with ? manually in code, other day you will find some other CONTROL CHAR. So I think better way is to use Encoder/Decoder if you want to preserve data otherwise remote it.

You can refer my question here: Illegal character - CTRL-CHAR

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM