简体繁体 English

java映射的FileChannel实现

[英]java mapped FileChannel implementation

原文 2012-12-16 04:35:11 3 2 java/ performance/ memory-mapped-files

Reading a file in using a Mapped FileChannel seems to be lightning fast... But I was wondering how they are doing this? 使用Mapped FileChannel读取文件似乎很快……但是我想知道他们是如何做到的？

Are they simply just reading in a large (~64kB) buffer and then letting me march through that? 他们只是在读一个大（〜64kB）的缓冲区，然后让我进入那个缓冲区吗？ Or is there more to it? 还是还有更多呢？

I'm just impressed by the speed and want to better understand the algorithm behind it. 我对速度印象深刻，想更好地了解其背后的算法。

2 个解决方案

They don't read anything until you do, then the piece you read is basically read via the OS paging system. 他们只有在您读完之后才会阅读任何东西，然后您基本上可以通过OS分页系统阅读您所阅读的文章。 The open may cost you almost nothing but repeated reads of the same piece of the file may cause repeated I/O. 打开可能几乎不会花费您什么，但重复读取同一文件片段可能会导致重复的I / O。 Nothing is free. 没有什么是免费的。

memory mapping, maps the file into your memory and Java provides a library to to wrap this so you can access it relatively safely. 内存映射，将文件映射到您的内存中，并且Java提供了一个包装它的库，因此您可以相对安全地访问它。

It's benefits include: 它的好处包括：

there only one copy in memory, in the OS disk cache and in your application(s) memory. 内存，操作系统磁盘缓存和应用程序内存中只有一个副本。
you can access random areas of the file without a system call. 您无需系统调用即可访问文件的随机区域。
Java does limit how much you can map in. ie if your maximum heap is 1 GB and your maximum direct memory is 1 GB, you can still map in 1 TB. Java确实限制了可以映射的数量。即，如果最大堆为1 GB，最大直接内存为1 GB，则仍然可以映射为1 TB。

It's disadvantages include: 它的缺点包括：

it consumes virtual memory which it doesn't give back if you re-map or close the files. 它会消耗虚拟内存，如果您重新映射或关闭文件，虚拟内存不会返回。 This is not such a problem if you have a 64-bit JVM, but is very limiting if you have a 32-bit JVM which might only have 1 GB free. 如果您使用的是64位JVM，那么这不是一个问题，但是如果您使用的32位JVM可能只有1 GB的可用空间，则不会有太大的限制。 It releases virtual memory when the GC runs. GC运行时，它将释放虚拟内存。
it reads/writes a minimum of a page at a time. 它一次只能读取/写入至少一页。 This can be good if you have lots of random access but actually slows sequential access if you are reading from/writing to many files on disk. 如果您具有大量随机访问权限，那么这可能会很好，但是如果您要从磁盘上读取/写入许多文件，则实际上会减慢顺序访问的速度。 Appending to many files randomly 4KB at a time can result in highly fragments files which is not idea. 一次随机添加到多个文件，每个文件大小为4KB，可能会导致文件碎片过多，这是不可行的。
working with memory mapped files can be more difficult that using a plain DataXxxxStream or BufferedReader/Writer. 与使用内存映射文件相比，使用纯DataXxxxStream或BufferedReader / Writer可能更加困难。

I have written a couple of libraries making memory mapped files easier to work with and I would say I would use it when ultra-low latency is critical or you need to read large amounts of memory which you expect to be in disk cache already and you want to make the most of your disk cache. 我已经编写了一些库，使内存映射文件更易于使用，我想说的是，当超低延迟非常关键，或者您需要读取大量的内存，而您希望这些内存已经存在于磁盘缓存中时，我会使用它。想要充分利用磁盘缓存。

It's worth noting that memory mapped doesn't make your disk sub-system faster and if that is your limiting factor it won't matter which way you read/write data. 值得注意的是，映射的内存并不能使您的磁盘子系统更快，如果这是您的限制因素，则用哪种方式读取/写入数据都无关紧要。