简体   繁体   中英

How safe are memory-mapped files for reading input files?

Mapping an input file into memory and then directly parsing data from the mapped memory pages can be a convenient and efficient way to read data from files.

However, this practice also seems fundamentally unsafe unless you can ensure that no other process writes to a mapped file, because even the data in private read-only mappings may change if the underlying file is written to by another process. (POSIX eg doesn't specify "whether modifications to the underlying object done after the MAP_PRIVATE mapping is established are visible through the MAP_PRIVATE mapping".)

If you wanted to make your code safe in the presence of external changes to the mapped file, you'd have to access the mapped memory only through volatile pointers and then be extremely careful about how you read and validate the input, which seems impractical for many use cases.

Is this analysis correct? The documentation for memory mapping APIs generally mentions this issue only in passing, if at all, so I wonder whether I'm missing something.

It is not really a problem.

Yes, another process may modify the file while you have it mapped, and yes, it is possible that you will see the modifications. It is even likely , since almost all operating systems have unified virtual memory systems, so unless one requests unbuffered writes, there's no way of writing without going through the buffer cache, and no way without someone holding a mapping seeing the change.
That isn't even a bad thing. Actually, it would be more disturbing if you couldn't see the changes. Since the file quasi becomes part of your address space when you map it, it makes perfect sense that you see changes to the file.

If you use conventional I/O (such as read ), someone can still modify the file while you are reading it. Worded differently, copying file content to a memory buffer is not always safe in presence of modifications . It is "safe" insofar as read will not crash, but it does not guarantee that your data is consistent.
Unless you use readv , you have no guarantees about atomicity whatsoever (and even with readv you have no guarantee that what you have in memory is consistent with what is on disk or that it doesn't change between two calls to readv ). Someone might modify the file between two read operations, or even while you are in the middle of it.
This isn't just something that isn't formally guaranteed but "probably still works" -- on the contrary, eg under Linux writes are demonstrably not atomic. Not even by accident.

The good news:
Usually, processes don't just open an arbitrary random file and start writing to it. When such a thing happens, it is usually either a well-known file that belongs to the process (eg log file), or a file that you explicitly told the process to write to (eg saving in a text editor), or the process creates a new file (eg compiler creating an object file), or the process merely appends to an existing file (eg db journals, and of course, log files). Or, a process might atomically replace a file with another one (or unlink it).

In every case, the whole scary problem boils down to "no issue" because either you are well aware of what will happen (so it's your responsibility), or it works seamlessly without interfering.

If you really don't like the possibility that another process could possibly write to your file while you have it mapped, you can simply omit FILE_SHARE_WRITE under Windows when you create the file handle. POSIX makes it somewhat more complicated since you need to fcntl the descriptor for a mandatory lock, which isn't necessary supported or 100% reliable on every system (for example, under Linux).

In theory, you're probably in real trouble if someone does modify the file while you're reading it. In practice: you're reading characters, and nothing else: no pointers, or anything which could get you into trouble. In practice... formally, I think it's still undefined behavior, but it's one which I don't think you have to worry about. Unless the modifications are very minor, you'll get a lot of compiler errors, but that's about the end of it.

The one case which might cause problems is if the file was shortened. I'm not sure what happens then, when you're reading beyond the end.

And finally: the system isn't arbitrarily going to open and modify the file. It's a source file; it will be some idiot programmer who does it, and he deserves what he gets. In no case will your undefined behavior corrupt the system or other peoples files.

Note too that most editors work on a private copy; when the write back, they do so by renaming the original, and creating a new file. Under Unix, once you've opened the file to mmap it, all that counts is the inode number. And when the editor renames or deletes the file, you still keep your copy. The modified file will get a new inode. The only thing you have to worry about is if someone opens the file for update, and then goes around modifying it. Not many programs do this on text files, except for appending additional data to the end.

So while formally, there's some risk, I don't think you have to worry about it. (If you're really paranoid, you could turn off write authorisation while you're mmap ed. And if there's really an enemy agent out to get your, he can turn it right back on.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM