简体   繁体   English

读/写带字符串的BINARY文件?

[英]Reading/writing a BINARY File with Strings?

How can I write/read a string from a binary file? 如何从二进制文件写入/读取字符串?

I've tried using writeUTF / readUTF (DataOutputStream/DataInputStream) but it was too much of a hassle. 我试过使用writeUTF / readUTF (DataOutputStream / DataInputStream),但这太麻烦了。

Thanks. 谢谢。

Forget about FileWriter, DataOutputStream for a moment. 暂时忘掉FileWriter,DataOutputStream。

  • For binary data one uses OutputStream and InputStream classes. 对于二进制数据,使用OutputStreamInputStream类。 They handle byte[] . 他们处理byte[]
  • For text data one uses Reader and Writer classes. 对于文本数据,使用ReaderWriter类。 They handle String which can store all kind of text, as it internally uses Unicode. 他们处理可存储所有类型文本的String ,因为它内部使用Unicode。

The crossover from text to binary data can be done by specifying the encoding, which defaults to the OS encoding. 通过指定默认为OS编码的编码,可以完成从文本到二进制数据的转换。

  • new OutputStreamWriter(outputStream, encoding)
  • string.getBytes(encoding)

So if you want to avoid byte[] and use String you must abuse an encoding which covers all 256 byte values in any order. 因此,如果要避免使用byte[]并使用String,则必须滥用一种编码,该编码以任何顺序覆盖所有256个字节的值。 So no "UTF-8", but maybe "windows-1252" (also named "Cp1252"). 因此,没有“ UTF-8”,而是“ windows-1252”(也称为“ Cp1252”)。

But internally there is a conversion, and in very rare cases problems might happen. 但是内部会发生转换,在极少数情况下可能会发生问题。 For instance é can in Unicode be one code, or two, e + combining diacritical mark right-accent ' . 例如,在Unicode中, é可以是一个代码,也可以是两个代码, e +组合变音符号“- ' There exists a conversion function (java.text.Normalizer) for that. 为此存在一个转换函数(java.text.Normalizer)。

One case where this already led to problems is file names in different operating systems; 已经导致问题的一种情况是不同操作系统中的文件名。 MacOS has another Unicode normalisation than Windows, and hence in version control system need special attention. MacOS具有Windows之外的另一种Unicode规范化,因此在版本控制系统中需要特别注意。

So on principle it is better to use the more cumbersome byte arrays, or ByteArrayInputStream, or java.nio buffers. 因此,原则上最好使用较麻烦的字节数组或ByteArrayInputStream或java.nio缓冲区。 Mind also that String char s are 16 bit. 还要注意String char是16位。

If you want to write text you can use Writers and Readers. 如果要编写文本,则可以使用作家和读者。

You can use Data*Stream writeUTF/readUTF, but the strings have to be less than 64K characters long. 您可以使用Data * Stream writeUTF / readUTF,但是字符串的长度必须少于64K个字符。


public static void main(String... args) throws IOException {
    // generate a million random words.
    List<String> words = new ArrayList<String>();
    for (int i = 0; i < 1000000; i++)
        words.add(Long.toHexString(System.nanoTime()));

    writeStrings("words", words);
    List<String> words2 = readWords("words");
    System.out.println("Words are the same is " + words.equals(words2));
}

public static List<String> readWords(String filename) throws IOException {
    DataInputStream dis = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
    int count = dis.readInt();
    List<String> words = new ArrayList<String>(count);
    while (words.size() < count)
        words.add(dis.readUTF());
    return words;
}

public static void writeStrings(String filename, List<String> words) throws IOException {
    DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(filename)));
    dos.writeInt(words.size());
    for (String word : words)
        dos.writeUTF(word);
    dos.close();
}

prints 版画

Words are the same is true

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM