简体   繁体   中英

What is the difference between writing to a file and a mapped memory?

I have the following questions related to handling files and mapping them ( mmap ):

  1. We know that if we create a file, and write to that file, then either ways we are writing to the memory. Then why map that file to memory using mmap and then write?
  2. If it is because of protection that we are achieving using mmap - PROT_NONE , PROT_READ , PROT_WRITE , then the same level of protection can also be achieved using files. O_RDONLY , O_RDWR etc. Then why mmap ?
  3. Is there any special advantage we get on mapping files to memory, and then using it? Rather than just creating a file and writing to it?
  4. Finally, suppose we mmap a file to memory, if we write to that memory location returned by mmap, does it also simultaneously write to that file as well?

Edit: sharing files between threads

As far as I know, if we share a file between two threads (not process) then it is advisable to mmap it into memory and then use it, rather than directly using the file.

But we know that using a file means, it is surely in main memory, then why again the threads needs to be mmaped?

A memory mapped file is actually partially or wholly mapped in memory (RAM), whereas a file you write to would be written to memory and then flushed to disk. A memory mapped file is taken from disk and placed into memory explicitly for reading and/or writing. It stays there until you unmap it.

Access to disk is slower, so when you've written to a file, it will be flushed to disk and no longer reside in RAM, which means, that next time you need the file, you might be going to get it from disk (slow), whereas in memory mapped files, you know the file is in RAM and you can have faster access to it then when it's on disk.

Also, mememory mapped files are often used as an IPC mechanism, so two or more processes can easily share the same file and read/write to it. (using necessary sycnh mechanisms)

When you need to read a file often, and this file is quite large, it can be advantageous to map it into memory so that you have faster access to it then having to go open it and get it from disk each time.

EDIT:

That depends on your needs, when you have a file that will need to be accessed very frequently by different threads, then I'm not sure that memory mapping the file will necessarily be a good idea, from the view that, you'll need to synch access to this mmap'ed file if you wish it write to it, in the same places from different threads. If that happens very often, it could be a spot for resource contention.

Just reading from the file, then this might be a good solution, cause you don't really need to synch access, if you're only reading from it from multiple threads. The moment you start writing, you do have to use synch mechanisms.

I suggest, that you have each thread do it's own file access in a thread local way, if you have to write to the file, just like you do with any other file. In this way it reduces the need for thread synchronization and the likelyhood of bugs hard to find and debug.

1) You misunderstand the write(2) system call. write() does not write, it just copies a buffer-contents to the OS buffer chain and marks it as dirty. One of the OS threads (bdflush IIRC) will pick up these buffers, write them to disk and fiddle with some flags. later. With mmap, you directly access the OS buffer (but if you alter it's contents, it will be marked dirty, too)

2) This is not about protection, It is about setting flags in the pagetable entries.

3) you avoid double buffering. Also you can address the file in terms of characters instead of blocks, which sometimes is more practical

4) It's the system buffers (hooked into your address space) you have been using. The system may or may not have written parts of it to disk.

5) If threads belong to the same process and share the pagetables and address-space, yes.

  1. One reason may be that you have (legacy) code that is set up to write into a data buffer, and then this buffer is written to file in one go at the end. In this case using mmap will save at least one copy of the data, as the OS can directly write the buffer to disk. As long as it is about file writing only, I can not (yet) imagine any other reasons why you'd want to use mmap .

  2. No, the protection is not relevant here I'd say.

  3. It might save one or two copies of the data from eg app buffer to libc buffer to OS buffer, see point 1. This might make a performance difference when writing large amounts of data.

  4. No. As far as I know, the OS is free to write the data at any time it likes, as long as the data has been written to disk after a call to msync or munmap on that memory region. (And for most files it will likely not write anything in between the majority of the time, for performce reasons: writing a whole block to disk because one byte changed is rather expensive, in particular if it is to be expected that a lot more modifications to the block will happen in the near future.)

In most cases you should consider memory mapped file as memory that you work with. You should care only about special cases like sync with disc. It's the same kind of storage as memory but it can be initialized from file and stored to file whenever you need.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM