简体繁体 English

在C ++程序的多次运行中将巨大的矩阵保留在内存中

[英]Keeping huge matrix in memory across multiple runs of a C++ program

原文 2017-04-08 23:39:48 6 3 c++

I'm writing some C++ code (using the Eigen3 matrix library) to solve a convex optimization problem involving a huge sparse matrix. 我正在编写一些C ++代码（使用Eigen3矩阵库）来解决涉及巨大稀疏矩阵的凸优化问题。 It takes a minute or so to read in the matrix from a file, and I don't want to have to read in the matrix from a file every single time I run my program. 从文件中读取矩阵需要一分钟左右的时间，我不想每次运行程序时都必须从文件中读取矩阵。 (I'm going to be tuning the parameters in my optimization algorithm, which involves running my code many times in a row, and I don't want to have to wait one minute to read in the big matrix each time.) （我将在优化算法中调整参数，这涉及到连续运行我的代码多次，并且我不想每次都等待一分钟来读取大矩阵。）

Is there a way that I can keep this big matrix in memory while I change some parameters in my code then recompile my code and run it again? 有没有办法在更改代码中的某些参数然后重新编译代码并再次运行它时将这个大矩阵保留在内存中？

This kind of thing is easy in Matlab, but I don't know how it's handled in C++ (although this is a common situation so there must be a standard approach that people take). 这种事情在Matlab中很容易，但是我不知道它在C ++中是如何处理的（尽管这是一种常见的情况，所以人们必须采取一种标准的方法）。

3 个解决方案

Is there a way that I can keep this big matrix in memory while I change some parameters in my code then recompile my code and run it again? 有没有办法在更改代码中的某些参数然后重新编译代码并再次运行它时将这个大矩阵保留在内存中？

AFAIK keeping the memory of a process while it is not running, and then "rerun" the process is not supported by any operating system. AFAIK会在进程未运行时保留其内存，然后“重新运行”该进程不受任何操作系统的支持。

You could try to: 您可以尝试：

improve the reading code for the matrix (or the representation it is stored in, like suggested by chtz). 改进矩阵（或存储在其中的表示形式，如chtz建议的）的阅读代码。
keep the matrix loaded by a helper process, and use inter-process communication to work with it from the process containing your "main code" (which can then be (re)started and stopped at will). 保持由辅助进程加载的矩阵，并使用进程间通信从包含“主代码”的进程中进行处理（然后可以随意重新启动和停止该主代码）。
try to implement some sort of "hot swapable module" / hot code reloading. 尝试实现某种“热交换模块” /热代码重载。

But most of these will (though fun) be extremely complex to implement. 但是其中大多数（尽管很有趣）实现起来非常复杂。

I'm going to be tuning the parameters in my optimization algorithm, which involves running my code many times in a row, and I don't want to have to wait one minute to read in the big matrix each time. 我将在优化算法中调整参数，这涉及到连续运行我的代码多次，并且我不想每次都等待一分钟来读取大矩阵。

How about getting those parameters from user input instead of hard coding them? 如何从用户输入中获取这些参数，而不是对其进行硬编码？ That would allow you to specify the parameters, run your code, read in another set of parameters, do another run, ... without having to recompile your program or stop and restart the process. 这样一来，您无需重新编译程序或停止并重新启动过程即可指定参数，运行代码，读取另一组参数，进行另一次运行...。

Your case is the perfect example for why the mmap() exists :) 您的案例是mmap()为什么存在的完美示例:)

mmap() (available on all modern platforms) allows you to treat a file on disk as regular RAM, with "direct" random read/write access and OS-backed paging support (much like what happens to your memory when it is swapped out by OS's memory manager) mmap（）（在所有现代平台上都可用）使您可以将磁盘上的文件视为常规RAM，并具有“直接”随机读/写访问权限和操作系统支持的分页支持（就像换出时内存的情况一样）通过操作系统的内存管理器）

Is there a way that I can keep this big matrix in memory while I change some parameters in my code then recompile my code and run it again? 有没有办法在更改代码中的某些参数然后重新编译代码并再次运行它时将这个大矩阵保留在内存中？

Well, yes... But I have a feeling its implementation would be way outside the scope of your project. 好吧，是的。但是我感觉它的实现将超出您项目的范围。 In essence this is what you'd do: 本质上，这就是您要做的：

Create a "loader" that would load the data into memory and make that memory "shared" (available to other processes) 创建一个“加载程序”，将数据加载到内存中并使该内存“共享”（可用于其他进程）
Launch your code, providing it with that memory's handle (or address, depending on your platform) so it can request access to it 启动您的代码，为其提供该内存的句柄（或地址，取决于您的平台），以便它可以请求对其进行访问
When done your code will quit, detaching from that shared memory, which is still going to be held by the loader process for the next launch of your code 完成后，您的代码将退出，并与该共享内存分离，该共享内存仍将由加载器进程保留，以供下次启动代码时使用

You can dump the data of your matrix in binary form -- just dump everything pointed to from S.outerIndexPtr() , S.innerIndexPtr() , S.valuePtr() (perhaps write all sizes at the start, if they are not always the same). 您可以以二进制形式转储矩阵的数据-只需转储S.outerIndexPtr() ， S.innerIndexPtr() ， S.valuePtr()所有内容（如果不总是，可能在开始时写所有大小S.valuePtr()相同）。

To read it again, just mmap your file and create a Map<SparseMatrix> from the correct start addresses. 要读一遍，只是mmap文件并创建一个Map<SparseMatrix>从正确的起始地址。