简体   繁体   中英

Fast random access read/write access on big files in java

I'm creating endgame database files for a game. Multiple threads compute the results of game positions, saving the result at the appropriate place in the file, but also consulting the sofar database in order to speedup the computation.

Until now I've just loaded a byte[] into memory but this morning it crashed while trying to create a file with more than Integer.MAX_VALUE bytes.

I consider two solutions:

  • wrapper using multiple byte arrays
  • random file access

Random File Access would also be nice since it goes beyond the limit of my RAM. My hope is that the operating system (Windows 10/Linux Mint 20) loads most of the file into RAM so that it'll be just as fast as byte[] for files which fit into RAM entirely and not too terrible otherwise (I have a very fast SSD).

Would that work or should I not even bother?

There are few primary options for dealing with large files in Java.

RandomAccesFile API

Pro:

  • Simple, though old school API

Con:

  • API is call heavy (both Java calls and syscalls) and suboptimal from performance point of view.

This API may still be good for simple read only or write only case.

FileChannel API

Compared to RandomAccesFile this class provide buffer oriented API which may be an advantage for both performance and organization of code.

FileChannel could also be configured for non-blocking IO which is important for maxin out your disk I/O.

Additionally FileChannel offers utility for zero copy data transfer file-file, socket-file, file-socket.

Memory mapped buffers

Memory mapped buffers is another IO option available via FileChannel .

In theory, memory mapping have minimal possible overhead for disk access, though in practice performance using is on-par with FileChannel .

Memory mapped buffer bring number of problems though:

  • Memory mapped ByteBuffer can be closed only by GC, so underlying file will remain open for unpredictable time (specifically painful in Windows).
  • Memory mapped operations are causing page fault which are interfering with JVM system thread. As a consequence application with memory mapped IO and experience frequent STW stalls.
  • Each memory mapped buffer is limited to 2 GiB. So managing multiple buffers is required. These buffers MUST be reused as they cannot be explicitly closed.

Memory mapped buffers may be a good choice if you are working with DB like data structures, have hard to predict access pattern and want to rely on OS caching is stead of own buffer management.

Still mentioned limitation and lack of performance benefits make memory mapped IO very niche solution in Java world

MappedByteBuffer and RandomAccessFile both had 2.1G limits (Integer.MAX_VALUE bytes), probably due to an internal byte array. What did the trick for me was using experimental stuff called MemorySegments:

MemorySegments https://docs.oracle.com/en/java/javase/15/docs/api/jdk.incubator.foreign/jdk/incubator/foreign/MemorySegment.html#mapFromPath(java.nio.file.Path,long,long,java.nio.channels.FileChannel.MapMode)

Seems to be fine and fast, but I needed to add additional parameters to the JVM:-/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM