简体   繁体   English

使用mmap时性能下降

[英]Performance degradation when using mmap

I have to compute a huge nxn matrix (n > 100000) and somehow store it in memory for further usage. 我必须计算一个巨大的nxn矩阵(n> 100000),并以某种方式将其存储在内存中以备将来使用。 Computation of a single element is quite expensive ( a few 1000 flops and memory accesses) and so I can't compute it on the fly. 单个元素的计算非常昂贵(几千次触发器和内存访问),因此我无法即时进行计算。 However I only need to compute it once and do not need to modify it later. 但是,我只需要计算一次即可,而无需在以后进行修改。 I also can't assume that I have enough swap space on the system. 我也不能假设我的系统上有足够的交换空间。 That's why i decided to create a cache file and use mmap to map it to memory: 这就是为什么我决定创建一个缓存文件并使用mmap将其映射到内存的原因:

int createCacheFile(std::size_t filesize, std::string const& filename){
    //create empty file
    int fileDescriptor = open(filename.c_str(), O_RDWR | O_CREAT | O_TRUNC, (mode_t)0600);
    //stretch to desired size
    lseek(fileDescriptor, filesize-1, SEEK_SET);
    return fileDescriptor;
}

//...
std::size_t n = 100000;
std::size_t fileSize = n*n*sizeof(float);
int fileDescriptor = createCacheFile(filesize,"matrix.cache");
float* memory = (float*) mmap(0, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, fileDescriptor, 0);

//and now fill it...

I wanted to compare performance and tried a small n=10000 and compared malloc, mmap with MAP_ANONYMOUS and the above implementation. 我想比较性能,并尝试将n = 10000小,并将malloc,mmap与MAP_ANONYMOUS和上述实现进行比较。 For this n the matrix fits completely into RAM. 为此,矩阵完全适合RAM。 While malloc and MAP_ANONYMOUS give quite similar results, I get a roughly factor 10 penalty when computing my matrix when it is backed by a file. 尽管malloc和MAP_ANONYMOUS给出了非常相似的结果,但是在计算由文件支持的矩阵时,我得到大约10倍的罚款。 It seems that the program is regularly stopped by the kernel so that it can write the contents safely to the file. 似乎该程序经常被内核停止,以便它可以将内容安全地写入文件。 I tried to resolve this using msync and mprotect on the parts of the matrix that i have already computed to give the kernel a hint that it can write the sections without having to stop the programs but nothing helped. 我试图在已经计算的矩阵部分上使用msync和mprotect来解决此问题,以向内核提示它可以编写这些部分而不必停止程序,但无济于事。

Is there a way to fix this? 有没有办法解决这个问题?

You could also use the madvise(2) syscall to inform the kernel about less useful pages (perhaps with MADV_SEQUENTIAL or MADV_DONTNEED ...). 您也可以使用madvise(2) syscall来通知内核有用性较低的页面(也许使用MADV_SEQUENTIALMADV_DONTNEED ...)。 Perhaps the posix_fadvise(2) syscall might be helpful for the file segment. pospos_fadvise(2) syscall可能对文件段有所帮助。 Eventually readahead(2) (in another thread, since it is blocking) might help also. 最终, readahead(2) (在另一个线程中,因为它正在阻塞)可能也会有所帮助。

And the file might sit in a fast filesystem, perhaps a tmpfs one.... 文件可能位于一个快速的文件系统中,也许是一个tmpfs文件系统。

Perhaps swapping on a fast disk (SSD) might also be useful. 也许在快速磁盘(SSD)上交换也可能有用。 swapon(2) syscall (and swapon command). swapon(2) syscall(和swapon命令)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM