简体   繁体   English

mmap、msync 和 linux 进程终止

[英]mmap, msync and linux process termination

I want to use mmap to implement persistence of certain portions of program state in a C program running under Linux by associating a fixed-size struct with a well known file name using mmap() with the MAP_SHARED flag set.我想通过使用 mmap() 和 MAP_SHARED 标志集将固定大小的结构与众所周知的文件名相关联,使用 mmap 在 Linux 下运行的 C 程序中实现程序状态的某些部分的持久性。 For performance reasons, I would prefer not to call msync() at all, and no other programs will be accessing this file.出于性能原因,我宁愿根本不调用 msync(),并且没有其他程序会访问此文件。 When my program terminates and is restarted, it will map the same file again and do some processing on it to recover the state that it was in before the termination.当我的程序终止并重新启动时,它会再次映射同一个文件并对其进行一些处理以恢复它在终止前所处的状态。 My question is this: if I never call msync() on the file descriptor, will the kernel guarantee that all updates to the memory will get written to disk and be subsequently recoverable even if my process is terminated with SIGKILL?我的问题是:如果我从不调用文件描述符上的 msync(),内核是否会保证对内存的所有更新都将写入磁盘并随后可以恢复,即使我的进程因 SIGKILL 终止? Also, will there be general system overhead from the kernel periodically writing the pages to disk even if my program never calls msync()?此外,即使我的程序从不调用 msync(),内核是否会定期将页面写入磁盘,也会产生一般系统开销?

EDIT: I've settled the problem of whether the data is written, but I'm still not sure about whether this will cause some unexpected system loading over trying to handle this problem with open()/write()/fsync() and taking the risk that some data might be lost if the process gets hit by KILL/SEGV/ABRT/etc.编辑:我已经解决了是否写入数据的问题,但我仍然不确定这是否会导致一些意外的系统加载尝试使用 open()/write()/fsync() 和如果进程被 KILL/SEGV/ABRT/等击中,则可能会丢失一些数据。 Added a 'linux-kernel' tag in hopes that some knowledgeable person might chime in.添加了一个 'linux-kernel' 标签,希望有知识的人可以加入。

I found a comment from Linus Torvalds that answers this question http://www.realworldtech.com/forum/?threadid=113923&curpostid=114068我发现 Linus Torvalds 的评论回答了这个问题http://www.realworldtech.com/forum/?threadid=113923&curpostid=114068

The mapped pages are part of the filesystem cache, which means that even if the user process that made a change to that page dies, the page is still managed by the kernel and as all concurrent accesses to that file will go through the kernel, other processes will get served from that cache.映射的页面是文件系统缓存的一部分,这意味着即使对该页面进行更改的用户进程死亡,该页面仍由内核管理,因为对该文件的所有并发访问都将通过内核,其他进程将从该缓存中获得服务。 In some old Linux kernels it was different, that's the reason why some kernel documents still tell to force msync .在一些旧的 Linux 内核中它是不同的,这就是为什么一些内核文档仍然告诉强制msync

EDIT: Thanks RobH corrected the link.编辑:谢谢 RobH 更正了链接。

EDIT:编辑:

A new flag, MAP_SYNC, is introduced since Linux 4.15, which can guarantee the coherence.从Linux 4.15开始引入了一个新的标志MAP_SYNC,可以保证一致性。

Shared file mappings with this flag provide the guarantee that while some memory is writably mapped in the address space of the process, it will be visible in the same file at the same offset even after the system crashes or is rebooted.具有此标志的共享文件映射提供了保证,虽然某些内存可写地映射到进程的地址空间中,但即使在系统崩溃或重新启动后,它仍将在同一文件中的相同偏移量中可见。

references:参考:

http://man7.org/linux/man-pages/man2/mmap.2.html search MAP_SYNC in the page http://man7.org/linux/man-pages/man2/mmap.2.html在页面中搜索MAP_SYNC

https://lwn.net/Articles/731706/ https://lwn.net/Articles/731706/

I decided to be less lazy and answer the question of whether the data is written to disk definitively by writing some code.我决定不那么懒惰,通过编写一些代码来回答数据是否明确写入磁盘的问题。 The answer is that it will be written.答案是会写。

Here is a program that kills itself abruptly after writing some data to an mmap'd file:这是一个在将一些数据写入 mmap 文件后突然杀死自己的程序:

#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

typedef struct {
  char data[100];
  uint16_t count;
} state_data;

const char *test_data = "test";

int main(int argc, const char *argv[]) {
  int fd = open("test.mm", O_RDWR|O_CREAT|O_TRUNC, (mode_t)0700);
  if (fd < 0) {
    perror("Unable to open file 'test.mm'");
    exit(1);
  }
  size_t data_length = sizeof(state_data);
  if (ftruncate(fd, data_length) < 0) {
    perror("Unable to truncate file 'test.mm'");
    exit(1);
  }
  state_data *data = (state_data *)mmap(NULL, data_length, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, fd, 0);
  if (MAP_FAILED == data) {
    perror("Unable to mmap file 'test.mm'");
    close(fd);
    exit(1);
  }
  memset(data, 0, data_length);
  for (data->count = 0; data->count < 5; ++data->count) {
    data->data[data->count] = test_data[data->count];
  }
  kill(getpid(), 9);
}

Here is a program that validates the resulting file after the previous program is dead:这是一个在前一个程序死后验证结果文件的程序:

#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <assert.h>

typedef struct {
  char data[100];
  uint16_t count;
} state_data;

const char *test_data = "test";

int main(int argc, const char *argv[]) {
  int fd = open("test.mm", O_RDONLY);
  if (fd < 0) {
    perror("Unable to open file 'test.mm'");
    exit(1);
  }
  size_t data_length = sizeof(state_data);
  state_data *data = (state_data *)mmap(NULL, data_length, PROT_READ, MAP_SHARED|MAP_POPULATE, fd, 0);
  if (MAP_FAILED == data) {
    perror("Unable to mmap file 'test.mm'");
    close(fd);
    exit(1);
  }
  assert(5 == data->count);
  unsigned index;
  for (index = 0; index < 4; ++index) {
    assert(test_data[index] == data->data[index]);
  }
  printf("Validated\n");
}

I found something adding to my confusion:我发现了一些增加我的困惑的东西:

munmap does not affect the object that was mappedthat is, the call to munmap does not cause the contents of the mapped region to be written to the disk file . munmap 不会影响被映射的对象,也就是说,调用 munmap 不会导致映射区域的内容写入磁盘文件 The updating of the disk file for a MAP_SHARED region happens automatically by the kernel's virtual memory algorithm as we store into the memory-mapped region.当我们存储到内存映射区域时,内核的虚拟内存算法会自动更新 MAP_SHARED 区域的磁盘文件。

this is excerpted from Advanced Programming in the UNIX® Environment .这摘自UNIX® 环境中的高级编程

from the linux manpage:从 linux 联机帮助页:

MAP_SHARED Share this mapping with all other processes that map this object. MAP_SHARED 与映射此对象的所有其他进程共享此映射。 Storing to the region is equiva-lent to writing to the file.存储到区域等同于写入文件。 The file may not actually be updated until msync(2) or munmap(2) are called.在调用 msync(2) 或 munmap(2) 之前,该文件实际上可能不会被更新。

the two seem contradictory.两者似乎是矛盾的。 is APUE wrong? APUE错了吗?

I didnot find a very precise answer to your question so decided add one more:我没有找到您问题的非常准确的答案,因此决定再添加一个:

  1. Firstly about losing data, using write or mmap/memcpy mechanisms both writes to page cache and are synced to underlying storage in background by OS based on its page replacement settings/algo.首先是关于丢失数据,使用 write 或 mmap/memcpy 机制既写入页面缓存,又由操作系统根据其页面替换设置/算法同步到后台存储。 For example linux has vm.dirty_writeback_centisecs which determines which pages are considered "old" to be flushed to disk.例如 linux 有 vm.dirty_writeback_centisecs,它决定哪些页面被认为是“旧的”页面被刷新到磁盘。 Now even if your process dies after the write call has succeeded, the data would not be lost as the data is already present in kernel pages which will eventually be written to storage.现在,即使您的进程在 write 调用成功后终止,数据也不会丢失,因为数据已经存在于最终将被写入存储的内核页面中。 The only case you would lose data is if OS itself crashes (kernel panic, power off etc).您丢失数据的唯一情况是操作系统本身崩溃(内核崩溃、断电等)。 The way to absolutely make sure your data has reached storage would be call fsync or msync (for mmapped regions) as the case might be.绝对确保您的数据已到达存储空间的方法是调用 fsync 或 msync(对于 mmapped 区域),视情况而定。
  2. About the system load concern, yes calling msync/fsync for each request is going to slow your throughput drastically, so do that only if you have to.关于系统负载问题,是的,为每个请求调用 msync/fsync 会大大降低您的吞吐量,因此只有在必须时才这样做。 Remember you are really protecting against losing data on OS crashes which I would assume is rare and probably something most could live with.请记住,您确实是在防止在操作系统崩溃时丢失数据,我认为这种情况很少见,而且可能大多数人都可以忍受。 One general optimization done is to issue sync at regular intervals say 1 sec to get a good balance.完成的一项一般优化是定期发出同步,比如 1 秒,以获得良好的平衡。

Either the Linux manpage information is incorrect or Linux is horribly non-conformant.要么 Linux 联机帮助页信息不正确,要么 Linux 非常不符合标准。 msync is not supposed to have anything to do with whether the changes are committed to the logical state of the file, or whether other processes using mmap or read to access the file see the changes; msync不应该与更改是否提交到文件的逻辑状态,或者其他使用mmapread访问文件的进程是否看到更改有关; it's purely an analogue of fsync and should be treated as a no-op except for the purposes of ensuring data integrity in the event of power failure or other hardware-level failure.它纯粹是fsync的类似物,除了在电源故障或其他硬件级故障的情况下确保数据完整性的目的外,应将其视为无操作。

According to the manpage,根据手册页,

The file may not actually be updated until msync(2) or munmap() is called.在调用 msync(2) 或 munmap() 之前,该文件实际上可能不会更新。

So you will need to make sure you call munmap() prior to exiting at the very least.因此,您至少需要确保在退出之前调用munmap()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM