使用mmap时性能下降

Question

I have to compute a huge nxn matrix (n > 100000) and somehow store it in memory for further usage. 我必须计算一个巨大的nxn矩阵（n> 100000），并以某种方式将其存储在内存中以备将来使用。 Computation of a single element is quite expensive ( a few 1000 flops and memory accesses) and so I can't compute it on the fly. 单个元素的计算非常昂贵（几千次触发器和内存访问），因此我无法即时进行计算。 However I only need to compute it once and do not need to modify it later. 但是，我只需要计算一次即可，而无需在以后进行修改。 I also can't assume that I have enough swap space on the system. 我也不能假设我的系统上有足够的交换空间。 That's why i decided to create a cache file and use mmap to map it to memory: 这就是为什么我决定创建一个缓存文件并使用mmap将其映射到内存的原因：

int createCacheFile(std::size_t filesize, std::string const& filename){
    //create empty file
    int fileDescriptor = open(filename.c_str(), O_RDWR | O_CREAT | O_TRUNC, (mode_t)0600);
    //stretch to desired size
    lseek(fileDescriptor, filesize-1, SEEK_SET);
    return fileDescriptor;
}

//...
std::size_t n = 100000;
std::size_t fileSize = n*n*sizeof(float);
int fileDescriptor = createCacheFile(filesize,"matrix.cache");
float* memory = (float*) mmap(0, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, fileDescriptor, 0);

//and now fill it...

I wanted to compare performance and tried a small n=10000 and compared malloc, mmap with MAP_ANONYMOUS and the above implementation. 我想比较性能，并尝试将n = 10000小，并将malloc，mmap与MAP_ANONYMOUS和上述实现进行比较。 For this n the matrix fits completely into RAM. 为此，矩阵完全适合RAM。 While malloc and MAP_ANONYMOUS give quite similar results, I get a roughly factor 10 penalty when computing my matrix when it is backed by a file. 尽管malloc和MAP_ANONYMOUS给出了非常相似的结果，但是在计算由文件支持的矩阵时，我得到大约10倍的罚款。 It seems that the program is regularly stopped by the kernel so that it can write the contents safely to the file. 似乎该程序经常被内核停止，以便它可以将内容安全地写入文件。 I tried to resolve this using msync and mprotect on the parts of the matrix that i have already computed to give the kernel a hint that it can write the sections without having to stop the programs but nothing helped. 我试图在已经计算的矩阵部分上使用msync和mprotect来解决此问题，以向内核提示它可以编写这些部分而不必停止程序，但无济于事。

Is there a way to fix this? 有没有办法解决这个问题？

Answer 1

You could also use the madvise(2) syscall to inform the kernel about less useful pages (perhaps with MADV_SEQUENTIAL or MADV_DONTNEED ...). 您也可以使用madvise（2） syscall来通知内核有用性较低的页面（也许使用MADV_SEQUENTIAL或MADV_DONTNEED ...）。 Perhaps the posix_fadvise(2) syscall might be helpful for the file segment. pospos_fadvise（2） syscall可能对文件段有所帮助。 Eventually readahead(2) (in another thread, since it is blocking) might help also. 最终， readahead（2）（在另一个线程中，因为它正在阻塞）可能也会有所帮助。

And the file might sit in a fast filesystem, perhaps a tmpfs one.... 文件可能位于一个快速的文件系统中，也许是一个tmpfs文件系统。

Perhaps swapping on a fast disk (SSD) might also be useful. 也许在快速磁盘（SSD）上交换也可能有用。 swapon(2) syscall (and swapon command). swapon（2） syscall（和swapon命令）。

使用mmap时性能下降

问题描述

1 个解决方案

解决方案1
0 2013-10-14 12:33:19

使用mmap时性能下降

问题描述

1 个解决方案

解决方案1 0 2013-10-14 12:33:19

解决方案1
0 2013-10-14 12:33:19