简体   繁体   English

使用Java从文件读取和写入字符串的绝对最快方法是什么?

[英]What is the absolute fastest way to read and write strings from a file with Java?

What is the absolute fastest way to read and write strings from a file with Java? 使用Java从文件读取和写入字符串的绝对最快方法是什么?

I need to read a file of known format into a String[] — where each line is one item in the array — and then back to the file. 我需要将一个已知格式的文件读入String[] - 其中每一行是数组中的一个项目 - 然后返回到文件。

The reading, in particular, must be as fast as possible. 特别是阅读必须尽可能快。

Is there a better way then just using a BufferedReader and reading line by line into an array? 有没有更好的方法,然后只使用BufferedReader并逐行读入数组?

考虑使用Google protobuf

Just a crazy idea: you could write the length of each string in the file. 只是一个疯狂的想法:你可以写出文件中每个字符串的长度。 Something like: 就像是:

BufferedInputStream stream=new BufferedInputStream(new FileInputStream("file.bin"));
byte[] buff=new byte[256];
String[] result=new String[10];
for(int i=0;i<10;i++){
    int n=(reader.read()<<8)|reader.read();    // string length (assuming all strings are less than 64K)
    if(buff.length<n) buff=new byte[n];
    reader.read(buff,0,n);
    result[i]=new String(buff,0,n);
}
stream.close();

This will free the BufferedReader from checking every input byte for \\n . 这将释放BufferedReader检查\\n每个输入字节。 Though I'm not sure that this will be faster than readLine() . 虽然我不确定这会比readLine()更快。

Use NIO and UTF-8 encoders/decoders which take advantage of your string statistics and also take advantage of JIT optmizations. 使用NIO和UTF-8编码器/解码器,它们利用您的字符串统计信息并利用JIT优化。 I believe aalto out / in are doing this, and I am sure you can find others. 我相信aalto out / in正在这样做,我相信你可以找到其他人。

Here would be my first pass, assuming that memory is not an issue (ha). 这是我的第一次传球,假设记忆不是问题(哈)。

  1. Get the file size as it sits on disk (File.length). 获取文件大小,因为它位于磁盘上(File.length)。
  2. Allocate that size buffer. 分配该大小的缓冲区。
  3. Load the whole thing in one shot (InputStream.read(byte[])). 一次性加载整个东西(InputStream.read(byte []))。
  4. Break that String into substrings entirely in memory. 将String完全分解为内存中的子串。
  5. Do Stuff (tm) 做东西(tm)
  6. Reverse above to save. 反转上面保存。

Keep in mind that Java stores character data with UCS-16 internally, which means that your nice ASCII file is going to take x2 the size on disk to account for the "expansion." 请记住,Java在内部使用UCS-16存储字符数据,这意味着您的优秀ASCII文件将在磁盘上采用x2大小来解释“扩展”。 eg You have a 4,124 byte foo.txt file will be at least 8,248 bytes in memory. 例如,你有一个4,124字节的foo.txt文件,内存中至少有8,248字节。

Everything else is going to be slower, because the application will be designed to deal with some sort of buffering and wrapping (in particular, to deal with not having enough memory to deal with the file size). 其他所有东西都会变慢,因为应用程序将被设计为处理某种缓冲和包装(特别是处理没有足够的内存来处理文件大小)。

Good luck! 祝好运!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM