简体   繁体   English

将文件映射到RAM的最佳方法?

[英]Optimal method to mmap a file to RAM?

I am using mmap to read a file and I only recently found out that it is not actually getting it into RAM, but is only creating a virtual address space for it. 我正在使用mmap读取文件,最近才发现它实际上并没有将其放入RAM,而只是为其创建了虚拟地址空间。 This will cause any accessing of the data to still use disk which I want to avoid, so I want to read it all into RAM. 这将导致对数据的任何访问仍然使用我要避免的磁盘,因此我希望将其全部读取到RAM中。

I am reading the file via: 我正在通过以下方式读取文件:

char* cs_virt;
cs_virt = (char*)mmap(0, nchars, PROT_READ, MAP_PRIVATE, finp, offset);

and when I loop after this, I see that the virtual memory for this process has, indeed, been blown up. 当我循环之后,我发现用于此过程的虚拟内存确实已被炸毁。 I want to copy this into RAM, though, so I do the following: 不过,我要将其复制到RAM中,因此请执行以下操作:

char* cs_virt;
cs_virt = (char*)mmap(0, nchars, PROT_READ, MAP_PRIVATE, finp, offset);
cs = (char*)malloc(nchars*sizeof(char));
for(int ichar = 0; ichar < nchars; ichar++) {
    cs[ichar] = cs_virt[ichar]; 
}

Is this the best method? 这是最好的方法吗? If not, what is a more efficient method to do this? 如果没有,什么是更有效的方法? I have this taking place in a function and cs is declared outside the function. 我发生在函数中,并且cs在函数外部声明。 Once I exit the function, I will retain cs , but will cs_virt need to be deleting or will it go away on it's own since it is declared locally in the function? 退出函数后,我将保留cs ,但是cs_virt是否需要删除,或者因为它在函数中本地声明,它是否会自行消失?

If you are using Linux, you may be able to use MAP_POPULATE : 如果您使用的是Linux,则可以使用MAP_POPULATE

MAP_POPULATE (since Linux 2.5.46) MAP_POPULATE (从Linux 2.5.46开始)
Populate (prefault) page tables for a mapping. 填充(故障前)页表以进行映射。 For a file mapping, this causes read-ahead on the file. 对于文件映射,这将导致文件上的预读。 Later accesses to the mapping will not be blocked by page faults. 页面错误不会阻止以后对映射的访问。 MAP_POPULATE is supported for private mappings only since Linux 2.6.23. MAP_POPULATE Linux 2.6.23开始,专用映射才支持MAP_POPULATE

This may be useful if you have time to spare when you mmap() but your later accesses need to be responsive. 如果您在mmap()时有时间空闲,但是以后的访问需要响应,则这可能很有用。 Consider also MAP_LOCKED if you really need the file to be mapped in and never swapped back out. 如果您确实需要将文件映射到并且永不换回,请考虑MAP_LOCKED

MPI and I/O is a murky issue. MPI和I / O是一个模糊的问题。 HDF5 seems to be the most common library that can help you with that, but it often needs tuning for the particular cluster, which is often impossible for mere users of the cluster. HDF5似乎是最常见的库,可以帮助您解决此问题,但是它通常需要针对特定​​集群进行调优,而对于单纯的集群用户而言,这通常是不可能的。 A colleague of mine had better success with SIONlib , and was able to get his code working on nearly 1e6 cores on JUGENE with that, so I'd have look at that. 我的一个同事在SIONlib上取得了更好的成功,并且能够使他的代码在JUGENE上的近1e6内核上运行,因此,我来​​看看。

In both cases you will probably need to adapt your file format. 在这两种情况下,您可能都需要调整文件格式。 In the case of my colleague it even paid of to write the data in parallel fashion using SIONlib, and to later do e sequential postprocessing to "defragment" the holes left be the parallel access pattern that SIONlib chose. 在我的同事的情况下,甚至还可以使用SIONlib以并行方式写入数据,并随后进行顺序的后处理以“整理”剩下的漏洞,这是SIONlib选择的并行访问模式。 It might be similar for input. 输入可能相似。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM