简体   繁体   中英

BufferedReader & FileReader read() Performance - Large Text File

I'm using the following 2 pieces of codes to read a large file.

This using a FileReader :

File file = new File("/Users/Desktop/shakes.txt");
FileReader reader = new FileReader(file);

int ch;
long start = System.currentTimeMillis();
while ((ch = reader.read()) != -1) {
    System.out.print((char) ch);
}
long end = System.currentTimeMillis();

And the following using a BufferedReader :

File file = new File("/Users/Desktop/shakes.txt");
BufferedReader reader = new BufferedReader(new FileReader(file));

int ch;
long start = System.currentTimeMillis();
while ((ch = reader.read()) != -1) {
    System.out.print((char) ch);
}
long end = System.currentTimeMillis();

Going by the documentation for BufferedReader :

It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders. Without buffering, each invocation of read() or readLine() could cause bytes to be read from the file, converted into characters, and then returned, which can be very inefficient.

Given this documentation and the default buffer size of 8192 of the BufferedReader class, shouldn't the overall time for reading the file with BufferedReader be quicker? Currently, both pieces of code run in ~3000ms on my machine. However, if I use 'readLine' in the BufferedReader, the performance substantially improves (~200ms).

Thoughts on something that I'm missing? Is it not expected that even with the 'read()' method, BufferedReader should give a better performance than reading from FileReader?

Using BufferedReader is indeed faster than using just FileReader.

I executed your code on my machine, with the following text file https://norvig.com/big.txt (6MB).

  • The initial result shows roughly the same time. About 17 seconds each.
  • However, this is because System.out.print() is a bottleneck (within the loop). Without print, the result is 4 times faster with BufferedReader . 200ms vs 50ms. (Compare it to 17s before)

In other words, don't use System.out.print() when benchmarking.

Example

An improved benchmark could look like this using StringBuilder .

File file = new File("/Users/Desktop/shakes.txt");
FileReader reader = new FileReader(file);

int ch;
StringBuilder sb = new StringBuilder();
long start = System.currentTimeMillis();
while ((ch = reader.read()) != -1) {
    //System.out.print((char) ch);
    sb.append((char) ch);
}
long end = System.currentTimeMillis();

System.out.println(sb);

The above code provides the same output but performs much faster. It will accurately show the difference in speed when using a BufferedReader .

Thoughts on something that I'm missing?

It should be faster to read a file a character at a time from a BufferedReader than a FileReader . (By orders of magnitude.) So I suspect that problem is in your benchmarks.

  1. Your benchmark is measuring both reading the file, and writing it to standard output. So basically, your performance figures will be distorted by the overheads of writing the file. And if your output is being written to a "console", then those overheads include the overheads of painting characters to the screen... and scrolling.

  2. Your benchmark takes no account of vm startup overheads.

  3. Your benchmark doesn't (obviously) take the effect of file caching. (The first time a file is read, it will be read from disc. If you read it again soon afterwards, you may be reading from a copy of the file cached in memory by the operating system. That will be faster.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM