简体繁体 English

将大量数据存储在内存中

[英]Store huge amount of data in memory

原文 2011-09-30 11:41:27 4 7 c++/ memory-management

I am looking for a way to store several gb's of data in memory. 我正在寻找一种方法来存储几个gb的数据。 The data is loaded into a tree structure. 数据被加载到树结构中。 I want to be able to access this data through my main function, but I'm not interested in reloading the data into the tree every time I run the program. 我希望能够通过我的main函数访问这些数据，但是每次运行程序时我都不想将数据重新加载到树中。 What is the best way to do this? 做这个的最好方式是什么？ Should I create a separate program for loading the data and then call it from the main function, or are there better alternatives? 我应该创建一个单独的程序来加载数据，然后从主函数调用它，还是有更好的选择？

thanks Mads 谢谢Mads

7 个解决方案

我会说最好的选择是使用数据库 - 这将是你的“用于加载数据的独立程序”。

If you are using a POSIX compliant system, then take a look into mmap . 如果您使用的是POSIX兼容系统，请查看mmap 。

I think Windows has another function to memory map a file. 我认为Windows有另一个内存映射文件的功能。

You could probably solve this using shared memory , to have one process that it long-lived build the tree and expose the address for it, and then other processes that start up can get hold of that same memory for querying. 您可以使用共享内存来解决这个问题，让一个持久的进程构建树并为其公开地址，然后启动的其他进程可以获取相同的内存以进行查询。 Note that you will need to make sure the tree is up to being read by multiple simultaneous processes, in that case. 请注意，在这种情况下，您需要确保树由多个同时进程读取。 If the reads are really just pure reads, then that should be easy enough. 如果读取实际上只是纯读取，那么这应该很容易。

您应该研究一种称为内存映射文件的技术。

I think the best solution is to configure a cache server and put data there. 我认为最好的解决方案是配置缓存服务器并将数据放在那里。

Look into Ehcache : 看看Ehcache ：

Ehcache is an open source, standards-based cache used to boost performance, offload the database and simplify scalability. Ehcache是一个基于标准的开源缓存，用于提高性能，卸载数据库并简化可伸缩性。 Ehcache is robust, proven and full-featured and this has made it the most widely-used Java-based cache. Ehcache功能强大，经过验证且功能齐全，这使其成为最广泛使用的基于Java的缓存。

It's written in Java, but should support any language you choose : 它是用Java编写的，但应该支持您选择的任何语言：

The Cache Server has two apis: RESTful resource oriented, and SOAP. 高速缓存服务器有两个api：面向RESTful资源和SOAP。 Both support clients in any programming language. 两者都支持任何编程语言的客户端。

You must be running a 64 bit system to use more than 4 GB's of memory. 您必须运行64位系统才能使用超过4 GB的内存。 If you build the tree and set it as a global variable, you can access the tree and data from any function in the program. 如果构建树并将其设置为全局变量，则可以从程序中的任何函数访问树和数据。 I suggest you perhaps try an alternative method that requires less memory consumption. 我建议你也许尝试一种需要更少内存消耗的替代方法。 If you post what type of program, and what type of tree you're doing, I can perhaps give you some help in finding an alternative method. 如果您发布了什么类型的程序，以及您正在做什么类型的树，我可能会帮助您找到替代方法。

Since you don't want to keep reloading the data...file storage and databases are out of question, but several gigs of memory seem like such a hefty price. 既然你不想继续重新加载数据......文件存储和数据库都是不可能的，但几次内存看起来像是一个巨大的价格。

Also note that on Windows systems, you can access the memory of another program using ReadProcessMemory(), all you need is a pointer to use for the location of the memory. 另请注意，在Windows系统上，您可以使用ReadProcessMemory（）访问另一个程序的内存，您只需要一个指针用于存储器的位置。

You may alternatively implement the data loader as an executable program and the main program as a dll loaded and unloaded on demand. 您也可以将数据加载器实现为可执行程序，将主程序实现为按需加载和卸载的DLL。 That way you can keep the data in the memory and be able to modify the processing code w/o reloading all the data or doing cross-process memory sharing. 这样，您可以将数据保存在内存中，并能够修改处理代码，无需重新加载所有数据或进行跨进程内存共享。

Also, if you can operate on the raw data from the disk w/o making any preprocessing of it (eg putting it in a tree, manipulating pointers to its internals), you may want to memory-map the data and avoid loading unused portions of it. 此外，如果您可以对磁盘上的原始数据进行操作（无需对其进行任何预处理（例如将其放入树中，操作指向其内部的指针），您可能需要对数据进行内存映射并避免加载未使用的部分它的。