简体   繁体   English

Java XMLStreamReader不能具有更高Unicode平面的属性值吗?

[英]Can't Java XMLStreamReader have attribute values with higher Unicode planes?

Lets create an XML file with two attribute values witch contain an extended unicode char 让我们创建一个包含两个属性值的XML文件,其中包含一个扩展的unicode char

XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();

try (BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(ERROR_XML), "UTF-8"))) {
XMLStreamWriter xmlStreamWriter = outputFactory.createXMLStreamWriter(writer);

xmlStreamWriter.writeStartDocument();
xmlStreamWriter.writeCharacters("\n");
xmlStreamWriter.writeStartElement("start");
xmlStreamWriter.writeAttribute("test1", "1𩸽1");
xmlStreamWriter.writeAttribute("test2", "2𩸽2");
xmlStreamWriter.writeEndElement();
xmlStreamWriter.writeEndDocument();
}

The generated file looks like this: 生成的文件如下所示:

<?xml version="1.0" ?>
<start test1="1𩸽1" test2="2𩸽2"></start>

If this is read in again and the attribute values examined 如果再次读入并检查属性值

XMLInputFactory inputFactory = XMLInputFactory.newInstance();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(ERROR_XML), "UTF-8"))) {
XMLStreamReader xmlStreamReader = inputFactory.createXMLStreamReader(reader);

xmlStreamReader.nextTag();
if (XMLStreamReader.START_ELEMENT == xmlStreamReader.getEventType() &&
    "start".equals(xmlStreamReader.getLocalName())) 
{
    System.out.println(xmlStreamReader.getAttributeValue(0));
    System.out.println(xmlStreamReader.getAttributeValue(1));
}}

this will print 这将打印

1𩸽1
2𩸽𩸽2

Astonishingly the second attribute value contains the extended unicode char 2 times! 令人惊讶的是,第二个属性值包含2次扩展的unicode char!

Any following use of an extended char as attribute value will increase this count. 以下任何使用扩展char作为属性值都会增加此计数。 In one case I received attribute values with 12000 identical characters instead of one. 在一个案例中,我收到了12000个相同字符的属性值,而不是一个。 What is happening here? 这里发生了什么?

There is a bug in the Java API corresponding class. Java API对应的类中存在一个错误。

You can use the "woodstox.jar" to do it correctly. 您可以使用“woodstox.jar”正确执行此操作。 All you need to do is to modifiy the code that reads the XML file as the following: 您需要做的就是修改读取XML文件的代码,如下所示:

  • XMLStreamReader2 instead of XMLStreamReader XMLStreamReader2而不是XMLStreamReader
  • XMLInputFactory2 instead of XMLInputFactory XMLInputFactory2而不是XMLInputFactory

It will work correctly. 它会正常工作。 I have tested my self. 我测试了自己。

You can find "woodstox.jar" in http://wiki.fasterxml.com/WoodstoxDownload . 您可以在http://wiki.fasterxml.com/WoodstoxDownload中找到“woodstox.jar”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM