简体繁体中英

Why mmap() (Memory Mapped File) is faster than read()

原文 2015-03-18 03:36:23 0 2 java/ c/ linux/ io/ operating-system

I was working on sth about MappedByteBuffer of Java NIO recently. I've read some posts about it and all of them mention that "mmap() is faster than read()"

In my conclusion:

I treat MappedByteBuffer == Memory Mapped File == mmap()
read() has to read data through : disk file -> kernel -> application, so it has context switch and buffer copying
They all said mmap() has less copying or syscall than read(), but as I know it also need to read from disk file the first time you access the file data. So the first time it read : virtual address -> memory -> page fault -> disk file -> kernel -> memory. Except you can access it randomly, the last 3 steps (disk file -> kernel -> memory) is exactly the same as read(), so how mmap() could be less copying or syscall than read() ?
what's the relationship between mmap() and swap file, Is that the os will put the least used file data of memory into swap (LRU) ? So when the second time you access these data, OS retrieves them from swap but not disk file(no need to copy to kernel buffer), that's why mmap() has less copying and syscall ?
In java, MappedByteBuffer is allocated out of heap (it's a direct buffer). So when you read from MappedByteBuffer, does it mean it need one more extra memory copy from outside the java heap into java heap?

Could anyone answer my questions ? Thanks :)

2 answers

1: Yes, that is essentially what a MappedByteBuffer is.

2: "disk file -> kernel" doesn't necessarily involve copying.

3: With a memory-mapped file, once the kernel has read the file into its cache, it can simply map that part of the cache into your process - instead of having to copy the data from the cache into a location your process specifies.

4: If the kernel decides to swap out a page from a memory-mapped file, it will not write the page to the page file; it will write the page to the original file (the one it's mapped from) before discarding the page. Writing it to the page file would be unnecessary and waste page file space.

5: Yes. For example, if you call get(byte[]) then the data will be copied from the off-heap mapping into your array. Note that functions such as get(byte[]) need to copy data for any type of buffer - this is not specific to memory-mapped files.

You're comparing apples and oranges. mmap() is 'faster than read() ' because it doesn't do any I/O. The I/O is deferred to when you access the memory addresses resulting from the map. That I/O is much the same as read(), and whether that I/O is faster than read() is a pretty moot point. I would want to see a proper benchmark before I accepted that.

I treat MappedByteBuffer == Memory Mapped File == mmap()

OK.

read() has to read data through : disk file -> kernel -> application, so it has twice context switch and buffer copying

Compared to what?

They all said mmap() has less copying or syscall than read(),

It has less system calls. Whether it has less copying depends on the implementation. It is certainly possible for data to be read and written directly via DMA but whether specific operating systems do that is operating-system-specific.

but as I know it also need to read from disk file the first time you access the file data.

Correct.

So the first time it read : virtual address -> memory -> page fault -> disk file -> kernel -> memory. Except you can access it randomly, the last 3 steps (disk file -> kernel -> memory) is exactly the same as read(), so how mmap() could be less copying or syscall than read() ?

Because of DMA, if implemented.

what's the relationship between mmap() and swap file

The memory allocated to the map is part of the process's address space, and it is virtual, and subject to swapping, and there has to be room in the swap file for it, just like any other piece of memory.

Is that the os will put the least used file data of memory into swap (LRU)?

No.

So when the second time you access these data, OS retrieves them from swap but not disk file(no need to copy to kernel buffer), that's why mmap() has less copying and syscall?

No. To do all that would be quite wrong.

In java, MappedByteBuffer is allocated out of heap (it's a direct buffer).

That doesn't make sense. Direct buffers aren't allocated out of the heap, they are allocated by mmap() or whatever the platform API is, as new memory. Not in the heap. You are correct that a MappedByteBuffer is a direct buffer.

So when you read from MappedByteBuffer, does it mean it need one more extra memory copy from outside the java heap into java heap?

Yes, but not for the reason above. The reason is that you have to call MappedByteBuffer.get()/put(), which is itself an extra step.

Read memory mapped file

Why does Java read a big file faster than C++?

Memory Mapped file in C++ read in Java

Why people say mmap by MappedByteBuffer is faster?

Why is using BufferedInputStream to read a file byte by byte faster than using FileInputStream?

How to read memory mapped file which is in particular format?

How to implement Concurrent read to a file mapped to memory in Java?

Truncate memory mapped file

Memory Mapped File location

Why would someone use a Memory Mapped File in PRIVATE mode?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Read memory mapped file Why does Java read a big file faster than C++? Memory Mapped file in C++ read in Java Why people say mmap by MappedByteBuffer is faster? Why is using BufferedInputStream to read a file byte by byte faster than using FileInputStream? How to read memory mapped file which is in particular format? How to implement Concurrent read to a file mapped to memory in Java? Truncate memory mapped file Memory Mapped File location Why would someone use a Memory Mapped File in PRIVATE mode?

Related Tags

Why mmap() (Memory Mapped File) is faster than read()

Question

2 answers

solution1
6 ACCPTED 2015-03-18 03:50:04

solution2
1 2015-03-18 03:46:52

Why mmap() (Memory Mapped File) is faster than read()

Question

2 answers

solution1 6 ACCPTED 2015-03-18 03:50:04

solution2 1 2015-03-18 03:46:52

solution1
6 ACCPTED 2015-03-18 03:50:04

solution2
1 2015-03-18 03:46:52