简体   繁体   中英

Why mmap() (Memory Mapped File) is faster than read()

I was working on sth about MappedByteBuffer of Java NIO recently. I've read some posts about it and all of them mention that "mmap() is faster than read()"

In my conclusion:

  1. I treat MappedByteBuffer == Memory Mapped File == mmap()

  2. read() has to read data through : disk file -> kernel -> application, so it has context switch and buffer copying

  3. They all said mmap() has less copying or syscall than read(), but as I know it also need to read from disk file the first time you access the file data. So the first time it read : virtual address -> memory -> page fault -> disk file -> kernel -> memory. Except you can access it randomly, the last 3 steps (disk file -> kernel -> memory) is exactly the same as read(), so how mmap() could be less copying or syscall than read() ?

  4. what's the relationship between mmap() and swap file, Is that the os will put the least used file data of memory into swap (LRU) ? So when the second time you access these data, OS retrieves them from swap but not disk file(no need to copy to kernel buffer), that's why mmap() has less copying and syscall ?

  5. In java, MappedByteBuffer is allocated out of heap (it's a direct buffer). So when you read from MappedByteBuffer, does it mean it need one more extra memory copy from outside the java heap into java heap?

Could anyone answer my questions ? Thanks :)

1: Yes, that is essentially what a MappedByteBuffer is.

2: "disk file -> kernel" doesn't necessarily involve copying.

3: With a memory-mapped file, once the kernel has read the file into its cache, it can simply map that part of the cache into your process - instead of having to copy the data from the cache into a location your process specifies.

4: If the kernel decides to swap out a page from a memory-mapped file, it will not write the page to the page file; it will write the page to the original file (the one it's mapped from) before discarding the page. Writing it to the page file would be unnecessary and waste page file space.

5: Yes. For example, if you call get(byte[]) then the data will be copied from the off-heap mapping into your array. Note that functions such as get(byte[]) need to copy data for any type of buffer - this is not specific to memory-mapped files.

You're comparing apples and oranges. mmap() is 'faster than read() ' because it doesn't do any I/O. The I/O is deferred to when you access the memory addresses resulting from the map. That I/O is much the same as read(), and whether that I/O is faster than read() is a pretty moot point. I would want to see a proper benchmark before I accepted that.

I treat MappedByteBuffer == Memory Mapped File == mmap()

OK.

read() has to read data through : disk file -> kernel -> application, so it has twice context switch and buffer copying

Compared to what?

They all said mmap() has less copying or syscall than read(),

It has less system calls. Whether it has less copying depends on the implementation. It is certainly possible for data to be read and written directly via DMA but whether specific operating systems do that is operating-system-specific.

but as I know it also need to read from disk file the first time you access the file data.

Correct.

So the first time it read : virtual address -> memory -> page fault -> disk file -> kernel -> memory. Except you can access it randomly, the last 3 steps (disk file -> kernel -> memory) is exactly the same as read(), so how mmap() could be less copying or syscall than read() ?

Because of DMA, if implemented.

what's the relationship between mmap() and swap file

The memory allocated to the map is part of the process's address space, and it is virtual, and subject to swapping, and there has to be room in the swap file for it, just like any other piece of memory.

Is that the os will put the least used file data of memory into swap (LRU)?

No.

So when the second time you access these data, OS retrieves them from swap but not disk file(no need to copy to kernel buffer), that's why mmap() has less copying and syscall?

No. To do all that would be quite wrong.

In java, MappedByteBuffer is allocated out of heap (it's a direct buffer).

That doesn't make sense. Direct buffers aren't allocated out of the heap, they are allocated by mmap() or whatever the platform API is, as new memory. Not in the heap. You are correct that a MappedByteBuffer is a direct buffer.

So when you read from MappedByteBuffer, does it mean it need one more extra memory copy from outside the java heap into java heap?

Yes, but not for the reason above. The reason is that you have to call MappedByteBuffer.get()/put(), which is itself an extra step.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM