简体   繁体   English

在C ++中将数据加载到内存中

[英]Loading data in memory in C++

I have the following data stored in a file on SSD (the size of the data is 2GB). 我将以下数据存储在SSD上的文件中(数据大小为2GB)。 I want to load this data in-memory, such that given Number1 and Number2, I am able to retrieve the list associated with it. 我想将此数据加载到内存中,以便在给定Number1和Number2的情况下,我能够检索与其关联的列表。

Number1  Number2  List(in sorted order. contains maximum 1000 elements)
12       1        5585,5587,5589,5590,5594,5597,5610,5615,5618,5619       
12       2        4561,4789,4980,5001,5008,5010,5100,5150,5240,5250
12       3        3010,3223,3225,3278,3890,4890,5001

13       1        3585,3587,3589,3590,3594,3597,3610,3615,3618,3619       
13       2        14561,14789,14980,15001,15008,15010,15100,15150,15240,15250
13       3        23010,23223,23225,23278,23890,24890,25001

14       1        1585,1587,1589,1590,1594,1597,1610,1615,1618,1619       
14       2        561,789,980,1001,1008,1010,1100,1150,1240,1250
14       3        1010,1223,1225,1278,1890,1891,15001
14       4        4,89,928,3958,95859

I am storing this data in std::map<unsigned,std::map<unigned,vector<unsigned>>> as given Number1 and Number2 I want to retrieve the list associated with it. 我将此数据存储在std::map<unsigned,std::map<unigned,vector<unsigned>>>作为给定的Number1和Number2,我想检索与其关联的列表。

However, it turns out that reading this data from the file and storing it in std::map<unsigned,std::map<unigned,vector<unsigned>>> in-memory on a 64GB server takes 5 hours. 但是,事实证明,从文件读取此数据并将其存储在64GB服务器上的内存中std::map<unsigned,std::map<unigned,vector<unsigned>>>中需要5个小时。 Is there some other data structure which I can use such that given Number1 and Number2 I can efficiently retrieve list associated with it. 还有其他一些数据结构可以使用,以便给定的Number1和Number2可以有效地检索与其关联的列表。 Also the data-structure should not take much time to load this data. 同样,数据结构也不需要花费很多时间来加载该数据。 Also the range of Number2 (given Number1) is always from 1 to 10. 同样,Number2(给定的Number1)的范围始终为1到10。

I am using: g++ (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15) 我正在使用:g ++(GCC)4.8.2 20140120(Red Hat 4.8.2-15)

Here are my suggestions: 这是我的建议:

  1. Best solution really is to store the data in a database. 最好的解决方案实际上是将数据存储在数据库中。 There is not much point in implementing your own database when companies have been doing this far the past few decades. 当公司在过去几十年中一直在这样做时,实现自己的数据库没有多大意义。 Just use one of them. 只需使用其中之一即可。 You can use MySQL's MEMORY engine if you really want the data to be completely loaded in memory: 如果您确实希望将数据完全加载到内存中,则可以使用MySQL的MEMORY引擎:

https://dev.mysql.com/doc/refman/5.5/en/memory-storage-engine.html https://dev.mysql.com/doc/refman/5.5/zh-CN/memory-storage-engine.html

  1. If Number1 and Number2 are integers, then maybe you can combine them to form a 64-bit long integer, and then use that is the key in your dictionary. 如果Number1和Number2是整数,则可以将它们组合成一个64位长的整数,然后使用它作为字典中的键。

  2. Using std::map in this case might be a bit slow, since it is internally implemented as a self-balancing binary tree, so its operations are O(log(n)). 在这种情况下使用std :: map可能会有点慢,因为它在内部被实现为自平衡二叉树,因此其操作为O(log(n))。 If you are OK using C++ 11 features, then you can use stl::unordered_map which is implemented as a hash, so operations are O(1). 如果可以使用C ++ 11功能,则可以使用stl :: unordered_map(它实现为散列),因此操作为O(1)。

you may try boost::multi_index_container. 您可以尝试boost :: multi_index_container。 Here is an example And there are many other examples , you can check them also. 这是一个示例 ,还有许多其他示例 ,您也可以检查它们 I just kown of these stuff, and hope it helps. 我只是知道这些东西,希望对您有所帮助。

This is an interesting problem, and as usual you'll have to compromise speed and space. 这是一个有趣的问题,通常您必须牺牲速度和空间。 Your solution is pretty bad at both, since using a map your memory will be really fragmented with so much data, and lookups will be on the logarithmic range, which is not optimal. 您的解决方案在这两个方面都非常糟糕,因为使用地图会导致您的内存真正被大量数据分散,并且查找将处于对数范围,这不是最佳选择。

You could try: 您可以尝试:

struct Value{
  std::vector<int> _values;
}
std::unordered_map<std::uint64_t, Value> values;

The key of the unordered map will be Number1*100 + Number2 无序映射的键将为Number1 * 100 + Number2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM