简体   繁体   English

如何实现或仿真MADV_ZERO?

[英]How to implement or emulate MADV_ZERO?

I would like to be able to zero out a range of a file memory-mapping without invoking any io (in order to efficiently sequentially overwrite huge files without incurring any disk read io). 我希望能够在不调用任何io的情况下将文件内存映射范围归零(以便有效地顺序覆盖大文件而不会引起任何磁盘读io)。

Doing std::memset(ptr, 0, length) will cause pages to be read from disk if they are not already in memory even if the entire pages are overwritten thus totally trashing disk performance. 执行std::memset(ptr, 0, length)将导致页面从磁盘读取,即使它们尚未在内存中,即使整个页面都被覆盖也是如此,这完全浪费了磁盘性能。

I would like to be able to do something like madvise(ptr, length, MADV_ZERO) which would zero out the range (similar to FALLOC_FL_ZERO_RANGE ) in order to cause zero fill page faults instead of regular io page faults when accessing the specified range. 我希望能够做一些类似于madvise(ptr, length, MADV_ZERO) ,该操作将使范围归零(类似于FALLOC_FL_ZERO_RANGE ),以便在访问指定范围时引起零填充页面错误而不是常规的io页面错误。

Unfortunately MADV_ZERO does not exists. 不幸的是MADV_ZERO不存在。 Even though the corresponding flag FALLOC_FL_ZERO_RANGE does exists in fallocate and can be used with fwrite to achieve a similar effect, though without instant cross process coherency. 即使对应的标志FALLOC_FL_ZERO_RANGE确实存在于fallocate并且可以与fwrite一起使用以实现类似的效果,尽管没有即时的跨进程一致性。

One possible alternative I would guess is to use MADV_REMOVE . 我猜一个可能的替代方法是使用MADV_REMOVE However, that can from my understanding cause file fragmentation and also blocks other operations while completing which makes me unsure of its long term performance implications. 但是,据我了解,这可能会导致文件碎片,并在完成时阻止其他操作,这使我不确定其对长期性能的影响。 My experience with Windows is that the similar FSCTL_SET_ZERO_DATA command can incur significant performance spikes when invoked. 我在Windows上的经验是,类似的FSCTL_SET_ZERO_DATA命令在调用时会导致明显的性能峰值。

My question is how one could implement or emulate MADV_ZERO for shared mappings, preferably in user mode? 我的问题是,最好在用户模式下,如何实现或仿真MADV_ZERO的共享映射?

1. /dev/zero/ 1. /dev/zero/

I have read it being suggested to simply read /dev/zero into the selected range . 我已经读过, 建议/dev/zero读入所选范围 Though I am not quite sure what "reading into the range" means and how to do it. 尽管我不太确定“读入范围”是什么意思以及如何去做。 Is it like a fread from /dev/zero into the memory range? 就像是从/dev/zero到内存范围的fread吗? Not sure how that would avoid a regular page fault on access? 不确定如何避免访问时出现常规页面错误?

For Linux, simply read /dev/zero into the selected range. 对于Linux,只需将/dev/zero读入所选范围。 The kernel already optimises this case for anonymous mappings. 内核已经针对匿名映射优化了这种情况。

If doing it in general turns out to be too hard to implement, I 如果一般来说很难实施,我
propose MADV_ZERO should have this effect: exactly like reading 建议MADV_ZERO应该具有此效果:就像阅读
/dev/zero into the range, but always efficient. / dev / zero进入范围,但始终有效。

EDIT: Following the thread a bit further it turns out that it will actually not work. 编辑: 进一步线程之后,事实证明它将实际上不起作用。

It does not do tricks when you are dealing with a shared mapping. 在处理共享映射时,它不会起作用。

2. MADV_REMOVE 2. MADV_REMOVE

One guess of implementing it in Linux (ie not in user application which is what I would prefer) could be by simply copying and modifying MADV_REMOVE , ie madvise_remove to use FALLOC_FL_ZERO_RANGE instead of FALLOC_FL_PUNCH_HOLE . 在Linux上实现它的一个猜测(即不是在用户应用程序中,我更喜欢这样)可能是通过简单地复制和修改MADV_REMOVE ,即madvise_remove使用FALLOC_FL_ZERO_RANGE而不是FALLOC_FL_PUNCH_HOLE Though I am bit over my head in guessing this, especially as I don't quite understand what the code around the vfs_allocate is doing: 尽管我对此感到有些vfs_allocate ,尤其是因为我不太了解vfs_allocate周围的代码在做什么:

// madvice.c
static long madvise_remove(...)
  ...
  /*
   * Filesystem's fallocate may need to take i_mutex.  We need to
   * explicitly grab a reference because the vma (and hence the
   * vma's reference to the file) can go away as soon as we drop
   * mmap_sem.
   */
  get_file(f); // Increment ref count.
  up_read(&current->mm->mmap_sem); // Release a read lock? Why?
  error = vfs_fallocate(f,
            FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, // FALLOC_FL_ZERO_RANGE?
            offset, end - start);
  fput(f); // Decrement ref count.
  down_read(&current->mm->mmap_sem); // Acquire read lock. Why?
  return error;
}

You probably cannot do what you want (in user space, without hacking the kernel). 您可能无法做您想做的事情(在用户空间中,如果不破解内核)。 Notice that writing zero pages might not incur physical disk IO because of the page cache . 请注意,由于页面缓存 ,写入零页面可能不会招致物理磁盘IO。

You might want to replace a file segment by a file hole (but this is not exactly what you want) in a sparse file , but some file systems (eg VFAT) don't have holes or sparse files. 您可能希望用稀疏文件中的文件孔(但这并不是您想要的)替换文件段,但是某些文件系统(例如VFAT)没有孔或稀疏文件。 See lseek(2) with SEEK_HOLE , ftruncate(2) 参见带有SEEK_HOLE lseek(2)ftruncate(2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM