简体   繁体   English

StAX - 使用XMLStreamWriter设置版本和编码

[英]StAX - Setting the version and encoding using XMLStreamWriter

I am using StAX for creating XML files and then validating the file with and XSD. 我使用StAX创建XML文件,然后使用和XSD验证文件。

I am getting an error while creating the XML file: 我在创建XML文件时遇到错误:

javax.xml.stream.XMLStreamException: Underlying stream encoding 'Cp1252' and input paramter for writeStartDocument() method 'UTF-8' do not match.
        at com.sun.xml.internal.stream.writers.XMLStreamWriterImpl.writeStartDocument(XMLStreamWriterImpl.java:1182)

Here is the code snippet: 这是代码片段:

XMLOutputFactory xof =  XMLOutputFactory.newInstance();

try{

  XMLStreamWriter xtw = xof.createXMLStreamWriter(new FileWriter(fileName));
  xtw.writeStartDocument("UTF-8","1.0");} catch(XMLStreamException e) {
  e.printStackTrace();

} catch(IOException ie) {

  ie.printStackTrace();

}

I am running this code on Unix. 我在Unix上运行这段代码。 Does anybody know how to set the version and encoding style? 有谁知道如何设置版本和编码样式?

I would try to use the createXMLStreamWriter() with an output parameter too. 我会尝试将createXMLStreamWriter()与输出参数一起使用。

[EDIT] Tried, it works by changing the createXMLStreamWriter line: [编辑]试过,它通过更改createXMLStreamWriter行来工作:

XMLStreamWriter xtw = xof.createXMLStreamWriter(new FileOutputStream(fileName), "UTF-8");

[EDIT 2] Made a little more complex test, for the record: [编辑2]做了一个更复杂的测试,记录:

String fileName = "Test.xml";
XMLOutputFactory xof =  XMLOutputFactory.newInstance();
XMLStreamWriter xtw = null;
try
{
  xtw = xof.createXMLStreamWriter(new FileOutputStream(fileName), "UTF-8");
  xtw.writeStartDocument("UTF-8", "1.0");
  xtw.writeStartElement("root");
  xtw.writeComment("This is an attempt to create an XML file with StAX");

  xtw.writeStartElement("foo");
  xtw.writeAttribute("order", "1");
    xtw.writeStartElement("meuh");
    xtw.writeAttribute("active", "true");
      xtw.writeCharacters("The cows are flying high this Spring");
    xtw.writeEndElement();
  xtw.writeEndElement();

  xtw.writeStartElement("bar");
  xtw.writeAttribute("order", "2");
    xtw.writeStartElement("tcho");
    xtw.writeAttribute("kola", "K");
      xtw.writeCharacters("Content of tcho tag");
    xtw.writeEndElement();
  xtw.writeEndElement();

  xtw.writeEndElement();
  xtw.writeEndDocument();
}
catch (XMLStreamException e)
{
  e.printStackTrace();
}
catch (IOException ie)
{
  ie.printStackTrace();
}
finally
{
  if (xtw != null)
  {
    try
    {
      xtw.close();
    }
    catch (XMLStreamException e)
    {
      e.printStackTrace();
    }
  }
}

This should work: 这应该工作:

// ...
Writer writer = new OutputStreamWriter(new FileOutputStream(fileName), "UTF-8");
XMLStreamWriter xtw = xof.createXMLStreamWriter(writer);
xtw.writeStartDocument("UTF-8", "1.0");
// ...

From the code it is hard to know for sure, but if you are relying on the default Stax implementation that JDK 1.6 provides (Sun sjsxp) I would recommend upgrading to use Woodstox . 从代码中可以肯定地知道,但如果您依赖于JDK 1.6提供的默认Stax实现(Sun sjsxp),我建议升级以使用Woodstox It is known to be less buggy than Sjsxp, supports the whole Stax2 API and has been actively developed and supported (whereas Sun version was just written and there has been limited number of bug fixes). 众所周知,它比Sjsxp更少,支持整个Stax2 API,并且一直在积极开发和支持(而Sun版本只是编写而且修复的bug数量有限)。

But the bug in your code is this: 但是代码中的错误是这样的:

XMLStreamWriter xtw = xof.createXMLStreamWriter(new FileWriter(fileName));

you are relying on the default platform encoding (which must be CP-1252, windows?). 您依赖于默认的平台编码(必须是CP-1252,Windows?)。 You should always explicitly specify encoding you are using. 您应该始终明确指定您正在使用的编码。 Stream writer is just verifying that you are not doing something dangerous, and spotted inconsistence that can cause corrupt document. 流编写器只是验证您没有做一些危险的事情,并发现可能导致文档损坏的不一致。 Pretty smart, which actually suggests that this is not the default Stax processor. 非常聪明,这实际上表明这不是默认的Stax处理器。 :-) :-)

(the other answer points a correct workaround, too, by just passing OutputStream and encoding to let XMLStreamWriter do the right thing) (另一个答案指出了一个正确的解决方法,只需传递OutputStream和编码让XMLStreamWriter做正确的事)

If using the default XMLStreamWriter bundled with the Oracle JRE/JDK you should always 如果使用与Oracle JRE / JDK捆绑在一起的默认XMLStreamWriter ,则应始终使用

  • create a XMLStreamWriter , explicitly setting the character encoding: xmlOutputFactory.createXMLStreamWriter(in, encoding) 创建一个XMLStreamWriter ,显式设置字符编码: xmlOutputFactory.createXMLStreamWriter(in, encoding)
  • start the document and explicitly setting the encoding: xmlStreamWriter.writeStartDocument(encoding, version) . 启动文档并显式设置编码: xmlStreamWriter.writeStartDocument(encoding, version) The writer is not smart enough remembering the encoding set when the writer was created. 编写器在创建编写器时记住编码集是不够聪明的。 However, it checks if these encodings are the same. 但是,它会检查这些编码是否相同。 See code below. 见下面的代码。

This way, your file encoding and XML declaration are always in sync. 这样,您的文件编码和XML声明始终保持同步。 Although specifying the encoding in the XML declaration is optional, XML best practice is to always specify it. 尽管在XML声明中指定编码是可选的,但XML最佳实践是始终指定它。

This is the code from the Oracle (Sun) implementation (Sjsxp): 这是Oracle(Sun)实现(Sjsxp)中的代码:

String streamEncoding = null;
if (fWriter instanceof OutputStreamWriter) {
    streamEncoding = ((OutputStreamWriter) fWriter).getEncoding();
}
else if (fWriter instanceof UTF8OutputStreamWriter) {
    streamEncoding = ((UTF8OutputStreamWriter) fWriter).getEncoding();
}
else if (fWriter instanceof XMLWriter) {
    streamEncoding = ((OutputStreamWriter) ((XMLWriter)fWriter).getWriter()).getEncoding();
}

if (streamEncoding != null && !streamEncoding.equalsIgnoreCase(encoding)) {
    // If the equality check failed, check for charset encoding aliases
    boolean foundAlias = false;
    Set aliases = Charset.forName(encoding).aliases();
    for (Iterator it = aliases.iterator(); !foundAlias && it.hasNext(); ) {
        if (streamEncoding.equalsIgnoreCase((String) it.next())) {
            foundAlias = true;
        }
    }
    // If no alias matches the encoding name, then report error
    if (!foundAlias) {
        throw new XMLStreamException("Underlying stream encoding '"
                + streamEncoding
                + "' and input paramter for writeStartDocument() method '"
                + encoding + "' do not match.");
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM