简体   繁体   中英

Loading data in memory in C++

I have the following data stored in a file on SSD (the size of the data is 2GB). I want to load this data in-memory, such that given Number1 and Number2, I am able to retrieve the list associated with it.

Number1  Number2  List(in sorted order. contains maximum 1000 elements)
12       1        5585,5587,5589,5590,5594,5597,5610,5615,5618,5619       
12       2        4561,4789,4980,5001,5008,5010,5100,5150,5240,5250
12       3        3010,3223,3225,3278,3890,4890,5001

13       1        3585,3587,3589,3590,3594,3597,3610,3615,3618,3619       
13       2        14561,14789,14980,15001,15008,15010,15100,15150,15240,15250
13       3        23010,23223,23225,23278,23890,24890,25001

14       1        1585,1587,1589,1590,1594,1597,1610,1615,1618,1619       
14       2        561,789,980,1001,1008,1010,1100,1150,1240,1250
14       3        1010,1223,1225,1278,1890,1891,15001
14       4        4,89,928,3958,95859

I am storing this data in std::map<unsigned,std::map<unigned,vector<unsigned>>> as given Number1 and Number2 I want to retrieve the list associated with it.

However, it turns out that reading this data from the file and storing it in std::map<unsigned,std::map<unigned,vector<unsigned>>> in-memory on a 64GB server takes 5 hours. Is there some other data structure which I can use such that given Number1 and Number2 I can efficiently retrieve list associated with it. Also the data-structure should not take much time to load this data. Also the range of Number2 (given Number1) is always from 1 to 10.

I am using: g++ (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15)

Here are my suggestions:

  1. Best solution really is to store the data in a database. There is not much point in implementing your own database when companies have been doing this far the past few decades. Just use one of them. You can use MySQL's MEMORY engine if you really want the data to be completely loaded in memory:

https://dev.mysql.com/doc/refman/5.5/en/memory-storage-engine.html

  1. If Number1 and Number2 are integers, then maybe you can combine them to form a 64-bit long integer, and then use that is the key in your dictionary.

  2. Using std::map in this case might be a bit slow, since it is internally implemented as a self-balancing binary tree, so its operations are O(log(n)). If you are OK using C++ 11 features, then you can use stl::unordered_map which is implemented as a hash, so operations are O(1).

you may try boost::multi_index_container. Here is an example And there are many other examples , you can check them also. I just kown of these stuff, and hope it helps.

This is an interesting problem, and as usual you'll have to compromise speed and space. Your solution is pretty bad at both, since using a map your memory will be really fragmented with so much data, and lookups will be on the logarithmic range, which is not optimal.

You could try:

struct Value{
  std::vector<int> _values;
}
std::unordered_map<std::uint64_t, Value> values;

The key of the unordered map will be Number1*100 + Number2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM