I want the first ten thousand lines of a hyuuge (.csv) file.
The naive way of
1) creating a reader & writer
2) reading the original file line for line
3) writing the first ten thousand lines to a new file
can't be the fastest, can it?
This will be a common operation in my app so I'm slightly concerned about speed, but also just curious.
Thanks.
There are a few ways of doing fast I/O in Java but without benchmarking for your particular case, it's kind of difficult to shoot out a figure/advice. Here are a few ways you can try benchmarking:
If you only want to read/write 10,000 lines or so:
Having said that, you can do better than reading a line at a time using BufferedReader.readLine()
or whatever.
Depending on the character encoding of the file, you will get better performance by doing byte-wise I/O with a BufferedInputStream and BufferedOutputStream with large buffer sizes. Just write a loop to read a byte, conditionally update the line counter and write the byte ... until you have copied the requisite number of lines. (This assumes that you can detect the CR and/or LF characters by examining the bytes. This is true for all character encodings I know about.)
If you use NIO and ByteBuffers, you can further reduce the amount of in-memory copying, though the CR / LF counting logic will be more complicated.
But the first question you should ask is whether it is even worthwhile bothering to optimize this.
Are the lines the same length. If so you can use RandomAccessFile to read x bytes and then write those bytes to a new file. It may be quite memory intensive though. I suspect this would be quicker but probably worth benchmarking. This solution would only work for fixed length lines
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.