简体   繁体   中英

Java - Using multiple threads to read/write to memory mapped buffers (MappedByteBuffer)

I have an application where a lot of File I/O (both reads and writes) takes place. I know that using multiple threads to do File I/O is not a good solution as it can degrade the performance (I have no control over the kind of disk which is used). So I ended up dedicating one thread for all File I/O. Can MappedByteBuffer be of any use in my case? I know that MappedByteBuffer is a memory area which is mapped to a file by the OS, can I leverage multiple threads to do I/O operations on different memory mapped buffers efficiently? Does disk head seek times still matter when multiple threads are mapping different files to different memory buffers? Is consistency guaranteed in such cases? Are there any benchmark results available for such cases?Thank you all in advance.

Can MappedByteBuffer be of any use in my case?

Refering to the JavaDoc a MappedByteBuffer should give you no performance advantages in contrast to a ByteBuffer. You could even end up with some unexpected changes during your runtime

The content of a mapped byte buffer can change at any time, for example if the content of the corresponding region of the mapped file is changed by this program or another.


I know that MappedByteBuffer is a memory area which is mapped to a file by the OS, can I leverage multiple threads to do I/O operations on different memory mapped buffers efficiently?

Except you know better than your OS or the VM how to read and write your data efficiently this is not the case.


Does disk head seek times still matter when multiple threads are mapping different files to different memory buffers?

The head still has to seek its position. Unless you have different disks and you do only disk IO it is useless to have more than one thread. In case you have some redundancy reading your data multithreading should be useful, because your OS will cache "hot" data.


Is consistency guaranteed in such cases?

Not really sure, what you mean, but you have to make sure accessing your ByteBuffer is somehow synchronized, because it is no threadsafe data structure.


Are there any benchmark results available for such cases?

Last year I did some sort of benchmarking, working with multiple buffers. Long story short, it really depends on the use case, the operating system and your hardware. Depending on how important this is I would recommend you do your own benchmarks. The only constant I remember is that you get the best performance writing data blocks of your disk segment size... which is somehow obvious ;-)

So long as you're not attempting to have more than one thread write to the same file at a given time, there's no problem with doing file I/O from different threads. Using NIO, the FileSystem implementation is way better than you could ever hope to be at managing disk writes and resources. Disk writes are buffered and asynchronous by default in Java, so there's no need to do something as convoluted as making a single thread do all your I/O and writing into memory buffers - this is almost exactly what OutputStreams writing to disk do already, but the native JVM will do it more efficiently than you could.

In fact, file I/O operations can benefit substantially from multithreading. Different threads can be processing read information while other threads are reading, and it can even sometimes be faster to read or write a few files in parallel than sequentially.

If you're suggesting that you want to map separate regions of the same file to different MappedByteBuffers, and want to compare writing the file that way to single-threaded, blocking, unbuffered writes to the same file, I'm pretty sure that you'll be very happy with the results from a performance perspective.

You should remember that when writing to MemoryMappedBuffers, you are not necessarily writing to the disk when you request to perform a write. The OS is responsible for deciding which MemoryMappedBuffers correspond to RAM and when that RAM is written back to disk; typically that means that while writing, that file or portion of a file is kept in RAM, and the file is written back to disk at the discretion of the OS, which may mean it's kept in memory until it looks like you're done writing it, and then moved to disk, or that it's kept in RAM until the RAM it's taking up is needed for something else, unless you force() it to be written out to disk.

I think, from a performance perspective, it depends a lot on what your goal is: do you want your algorithm that does the writing to finish faster, in which case the memory mapped regions may well be a good option, as the algorithm can finish before the file finishes writing to disk, or do you want the file copied to the disk faster, in which case it's hard to say: if you are able to break up the file into nice large chunks that can be efficiently written to disk, and if the OS is able to recognize when you're done with a region and only writes each region back to disk once during the process, it may be more efficient.

On the other hand, if your current implementation is writing to disk very efficiently, ie if you are successfully arranging the writes to the file efficiently, such that there is little seeking necessary (if using hard disks), and the writes are buffered appropriately, so that you aren't forcing the OS to write small bits of the file all the way to disk before permitting it to have the next bit of the file, or writing bytes at random (which even solid state drives do not like, since they must write a certain sized region, and cannot write single bytes individually), then it's entirely possible that your current strategy would finish writing the file to disk faster -- assuming that getting the file onto the physical disk as fast as possible is the goal.

If you want to know how much room for improvement there is, you could compare your speed with the speed of a hard-drive performance test on your system, that should be able to benchmark the limit on your throughput to the disk; if that's significantly faster than your current implementation, either there's room for improvement in your writing strategy, or it's generating the data, rather than writing it, that's taking the time.

To test the latter, you could try having your algorithm write to ByteBuffers that are not memory mapped; with no file I/O, you can benchmark the speed of your algorithms independently of the disk.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM