简体   繁体   中英

What would be the fastest way to read integers from a file in Java?

I have a file of integers arranged like this:

1 2 3 55 22 11 (and so on)

And I want to read in these numbers as fast as possible to lessen the total execution time of my program. So far, I am using a scanner with good results. However, I get the feeling that there exists a faster IO utility I can use. Can anyone please point me in the right direction?

EDIT:

So yes, I verified that it is the IO in my program that's taking the most time by setting up different timers around the java code and comparing results.

Current file format

If the numbers are represented as Strings there is no faster way to read them in and parse them, disk I/O is going to be orders of magnitude slower than anything the CPU is doing. The only thing can do is use a BufferedReader with a huge buffer size and try and get as much if not all the file in the memory before using Scanner .

Alternate file format

If you can represent them as binary in the file and read the numbers in using the DataInputStream class , then you might get a small decrease in I/O time and a marginal CPU decrease because you don't need to parse the String representation into an int that probably would not be measurable unless your input file in in hundreds of megabytes or larger. **Buffering the input stream will still have more effect than anything else, use a BufferedInputStream in this case.

How to optimize

You need robust profiling to even detect if any changes you make are impacting performance positively or negatively .

Things like OS disk caching will skew benchmarks if you read the same file in over and over, the OS will cache it and screw up your benchmarks. Learn what good enough is sooner than later.

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil" - Donald Knuth

The premature part of Kunth's quote is the important part, it means:

Don't optimize without profiling and benchmarks to verify that what you are changing is actually a bottleneck and that you can measure the positve or negative impact of your changes.

Here is a quick benchmark comparing a BufferedInputStream reading the same set of binary numbers versus a Scanner backed by a BufferedReader reading the same set of numbers as text representations with a SPACE delimiter.

the results are pretty consistent:

For 1,000 numbers on my Core i3 laptop with 8GB of RAM

Read binary file in 0001 ms
Read text file in   0041 ms

For 1,000,000 numbers on my Core i3 laptop with 8GB of RAM

Read binary file in 0603 ms
Read text file in   1509 ms

For 50,000,000 numbers on my Core i3 laptop with 8GB of RAM

Read binary file in 29020 ms
Read text file in   70346 ms

File sizes for the 50,000,000 numbers were as follows:

 48M input.dat
419M input.txt

Reading the binary is much faster until the set of numbers grows very large. I/O on binary encoded ints is less ( by about 10 times ), there is no String parsing logic, and other overhead of object creation and whatever else that Scanner does. I went ahead and used the Buffered versions of the InputStream and Reader classes because those are best practices and should be used whenever possible.

For extra credit, compression would reduce the I/O wait even more on the large files with almost no measurable effect on the CPU time.

Generally you can read the data as fast as the disk allows. The best way to read it faster is to make it more compact or get a faster disk.

For the format you are using, I would GZip the files and read the compressed data. This is a simple way to increase the rate you can read the underlying data.

Escalation possibilities:

  • Buy a faster disk.
  • Buy an ssd-drive.
  • Store the file in a ramdisk.

There is always a tradeoff in gaining more performance/speed. The above methods will cost money, and have to be performed on every host, so if this is a program which is sold to multiple customers, it could be a better option to twiddle at the algorithm, which will save money on every host, the program is run.

If you compress the file, or store binary data, the speed to read is increased, but it will be harder to inspect the data with independent tools. Of course we can not tell how often this might happen.

In most circumstances I would suggest keeping human readable data, and live with a slower program, but of course it depends how much time you lose, how often you lose it, and so on too.

And maybe it is just an excercise, to find out, how fast you can get. But then I like to warn from the habit to always reach for highest performance without considering the tradeoffs and the costs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM