简体   繁体   中英

Can't Java XMLStreamReader have attribute values with higher Unicode planes?

Lets create an XML file with two attribute values witch contain an extended unicode char

XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();

try (BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(ERROR_XML), "UTF-8"))) {
XMLStreamWriter xmlStreamWriter = outputFactory.createXMLStreamWriter(writer);

xmlStreamWriter.writeStartDocument();
xmlStreamWriter.writeCharacters("\n");
xmlStreamWriter.writeStartElement("start");
xmlStreamWriter.writeAttribute("test1", "1𩸽1");
xmlStreamWriter.writeAttribute("test2", "2𩸽2");
xmlStreamWriter.writeEndElement();
xmlStreamWriter.writeEndDocument();
}

The generated file looks like this:

<?xml version="1.0" ?>
<start test1="1𩸽1" test2="2𩸽2"></start>

If this is read in again and the attribute values examined

XMLInputFactory inputFactory = XMLInputFactory.newInstance();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(ERROR_XML), "UTF-8"))) {
XMLStreamReader xmlStreamReader = inputFactory.createXMLStreamReader(reader);

xmlStreamReader.nextTag();
if (XMLStreamReader.START_ELEMENT == xmlStreamReader.getEventType() &&
    "start".equals(xmlStreamReader.getLocalName())) 
{
    System.out.println(xmlStreamReader.getAttributeValue(0));
    System.out.println(xmlStreamReader.getAttributeValue(1));
}}

this will print

1𩸽1
2𩸽𩸽2

Astonishingly the second attribute value contains the extended unicode char 2 times!

Any following use of an extended char as attribute value will increase this count. In one case I received attribute values with 12000 identical characters instead of one. What is happening here?

There is a bug in the Java API corresponding class.

You can use the "woodstox.jar" to do it correctly. All you need to do is to modifiy the code that reads the XML file as the following:

  • XMLStreamReader2 instead of XMLStreamReader
  • XMLInputFactory2 instead of XMLInputFactory

It will work correctly. I have tested my self.

You can find "woodstox.jar" in http://wiki.fasterxml.com/WoodstoxDownload .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM