简体   繁体   English

UTF-8写入xml成功

[英]UTF-8 write xml successful

today I faced with very interesting problem. 今天我面临一个非常有趣的问题。 When I try to rewrite xml file. 当我尝试重写xml文件时。

I have 3 ways to do this. 我有3种方法可以做到这一点。 And I want to know the best way and reason of problem. 我想知道问题的最佳方法和原因。

I. 一世。

File file = new File(REAL_XML_PATH);
         try {
         FileWriter fileWriter = new FileWriter(file);
         XMLOutputter xmlOutput = new XMLOutputter();

     xmlOutput.output(document, System.out);
     xmlOutput.output(document, fileWriter);

     fileWriter.close();
     } catch (IOException e) {
     // TODO Auto-generated catch block
     e.printStackTrace();
     }

In this case I have a big problem with my app. 在这种情况下,我的应用程序存在很大问题。 After writing in file in my own language I can't read anything. 用我自己的语言写完文件后,我什么都看不到。 Encoding file was changed on ANSI javax.servlet.ServletException: javax.servlet.jsp.JspException: Invalid argument looking up property: "document.rootElement.children[0].children" ANSI上的编码文件已更改javax.servlet.ServletException: javax.servlet.jsp.JspException: Invalid argument looking up property: "document.rootElement.children[0].children"

II. II。

File file = new File(REAL_XML_PATH);
         XMLOutputter output=new XMLOutputter();
         try {
            output.output(document, new FileOutputStream(file));
        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

In this case I haven't problems. 在这种情况下,我没有问题。 Encoding wasn't change. 编码没有改变。 No problem with reading and writing. 读写没有问题。

And this article http://tripoverit.blogspot.com/2007/04/javas-utf-8-and-unicode-writing-is.html 而本文http://tripoverit.blogspot.com/2007/04/javas-utf-8-and-unicode-writing-is.html

And I want to know the best way and reason of problem. 我想知道问题的最佳方法和原因。

Well, this looks like the problem: 好吧,这看起来像是问题:

FileWriter fileWriter = new FileWriter(file); 

That will always use the platform default encoding, which is rarely what you want. 这将始终使用平台默认编码,而这几乎是您所不需要的。 Suppose your default encoding is ISO-8859-1. 假设您的默认编码为ISO-8859-1。 If your document declares itself to be encoded in UTF-8, but you actually write everything in ISO-8859-1, then your file will be invalid if you have any non-ASCII characters - you'll end up writing them out with the ISO-8859-1 single byte representation, which isn't valid UTF-8. 如果您的文档声明自己使用UTF-8编码,但实际上是使用ISO-8859-1编写的,则如果您有任何非ASCII字符,则文件将无效-您最终将使用ISO-8859-1单字节表示形式,无效的UTF-8。

I would actually provide a stream to XMLOutputter rather than a Writer . 我实际上将提供一个流到XMLOutputter而不是Writer That way there's no room for conflict between the encoding declared by the file and the encoding used by the writer. 这样,文件声明的编码与编写器使用的编码之间就没有冲突的余地。 So just change your code to: 因此,只需将代码更改为:

FileOutputStream fileOutput = new FileOutputStream(file);
...
xmlOutput.output(document, fileOutput);

... as I now see you've done in your second bit of code. ...正如我现在看到的,您已经完成了第二部分代码。 So yes, this is the preferred approach. 是的,这是首选方法。 Here, the stream makes no assumptions about the encoding to use, because it's just going to handle binary data. 在这里,流不假设要使用的编码,因为它仅用于处理二进制数据。 The XML writing code gets to decide what that binary data will be, and it can make sure that the character encoding it really uses matches the declaration at the start of the file. XML编写代码可以确定二进制数据是什么,并且可以确保其实际使用的字符编码与文件开头的声明匹配。

You should also clean up your exception handling - don't just print a stack trace and continue on failure, and call close in a finally block instead of at the end of the try block. 应该清理异常处理-不要只是打印堆栈跟踪并在失败时继续操作,而是在finally块而不是try块的末尾调用close If you can't genuinely handle an exception, either let it propagate up the stack directly (potentially adding throws clauses to your method) or catch it, log it and then rethrow either the exception or a more appropriate one wrapping the cause. 如果您不能真正地处理异常,则可以让它直接在堆栈上传播(可能在您的方法中添加throws子句),或者捕获它,将其记录下来,然后重新抛出该异常,或者重新抛出该异常来包装该原因。

If I remember correctly, you can force your xmlOutputter to use a "pretty" format with: new XMLOutputter(Format.getPrettyFormat()) so it should work with I too 如果我没记错的话,可以通过以下方式强制xmlOutputter使用“漂亮”格式:new XMLOutputter(Format.getPrettyFormat()),因此它也可以与我一起使用

pretty is: 漂亮的是:

Returns a new Format object that performs whitespace beautification with 2-space indents, uses the UTF-8 encoding, doesn't expand empty elements, includes the declaration and encoding, and uses the default entity escape strategy. 返回一个新的Format对象,该对象使用2个空格缩进来执行空格美化,使用UTF-8编码,不展开空元素,包括声明和编码,并使用默认的实体转义策略。 Tweaks can be made to the returned Format instance without affecting other instances. 可以对返回的Format实例进行调整,而不会影响其他实例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM