简体   繁体   English

如何使用 RandomAccessFile 将 UTF8 数据写入 xml 文件?

[英]how to write UTF8 data to xml file using RandomAccessFile?

When trying to write some UTF8 data to a file, I end up with some garbage in the file.当尝试将一些 UTF8 数据写入文件时,我最终在文件中产生了一些垃圾。 The code is as follows代码如下

public static boolean saveToFile(StringBuffer buffer,
                                   String fileName,
                                   ArrayList exceptionList,
                                   String className)
  {
    log.debug("In saveToFile for file [" + fileName + "]");

                RandomAccessFile raf = null;
                File file = new File(fileName);
                File backupFile = new File(fileName+"_bck");

                try
                {
                    if (file.exists())
                    {
                            if (backupFile.exists())
                            {
                            backupFile.delete();
                            }
                            file.renameTo(backupFile);
                    }
                    raf = new RandomAccessFile(file, "rw");
                    raf.writeBytes(buffer.toString());
                    raf.close();

The output of buffer.toString() is buffer.toString() 的输出是

<?xml version="1.0" encoding="UTF-8"?>
<ivr>
<version>1.1</version>
<templateName>αβγδεζη

The data in the file however is但是文件中的数据是

<?xml version="1.0" encoding="UTF-8"?>
<ivr>
<version>1.1</version>
<templateName>▒▒▒▒▒▒▒</templateName>

How can I make sure that data i nthe file itself is UTF8如何确保文件本身中的数据是 UTF8

I'm not surpised you get garbage:我不惊讶你得到垃圾:

 raf.writeBytes(buffer.toString())

The documentation for RandomAccessFile.writeBytes(String) says (emphasis added): RandomAccessFile.writeBytes(String)的文档说(强调):

Writes the string to the file as a sequence of bytes.将字符串作为字节序列写入文件。 Each character in the string is written out, in sequence, by discarding its high eight bits .通过丢弃其高八位顺序写出字符串中的每个字符。

In a few circumstances, that operation will result in a correctly encoded file.在少数情况下,该操作将生成正确编码的文件。 But in most it won't.但在大多数情况下不会。 That writeBytes() method is a foolish design by the Java developers.这个writeBytes()方法是 Java 开发人员的一个愚蠢的设计。 You need to correctly encode your text as bytes in UTF-8, and then write those bytes.您需要将文本正确编码为 UTF-8 字节,然后写入这些字节。

Do you really need to operate on the file as a random access file.您是否真的需要将文件作为随机访问文件进行操作。 If not, just manipulate it with a Writer wrapping an OutputStream .如果没有,只需使用包装OutputStreamWriter操作它。

You could use Charset.encode(CharBuffer) to produce a ByteBuffer holding the encoded bytes, then write those bytes to the file:您可以使用Charset.encode(CharBuffer)生成一个ByteBuffer保存编码的字节,然后将这些字节写入文件:

 raf.write(StandardCharsets.UTF_8.encode(buffer).array());

The Javadoc for RandomAccessFile states that for writeBytes() RandomAccessFileJavadoc声明对于writeBytes()

Writes the string to the file as a sequence of bytes.将字符串作为字节序列写入文件。 Each character in the string is written out, in sequence, by discarding its high eight bits .通过丢弃其高八位顺序写出字符串中的每个字符。 The write starts at the current position of the file pointer.写入从文件指针的当前位置开始。

Assuming that discarding parts of your String isn't what you want, you should be using writeUtf() :假设丢弃部分 String不是您想要的,您应该使用writeUtf()

Writes a string to the file using modified UTF-8 encoding in a machine-independent manner.以独立于机器的方式使用修改后的 UTF-8 编码将字符串写入文件。

String txt = buffer.toString();
raf.write(txt.getBytes(StandardCharsets.UTF_8));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM