简体   繁体   中英

Minimizing memory overhead using C++ containers (std::map and std::vector too expensive)

I am expecting to be handling a huge number of data records, whereby around 20 uint8_t keys will have millions of <int, struct> pairs associated with each of them (ordered by int ). These pairs are rather lightweight at ~10 bytes, and need to be allocated dynamically.

Initially, I was using a std::map<uint8_t, std::vector<int, struct>> but after studying the overhead associated with vectors, namely the capacity() in

3 machine words in total + sizeof(element) * capacity()

as seen here ; capacity() "typically has room for up to twice the actual number of elements" which is seemingly detrimental.

Instead of a vector, I could use a std::map, however the overhead of ~32 bytes per node also becomes very expensive for such light weight pairs.

I am unfamiliar with Boost and other C++ libraries, so was wondering whether anyone could advise on a solution where I could avoid manual dynamic memory allocation?


Edit : To clarify following a few questions in comments, the struct stored will contain 3 shorts (to start with), and no further data structures. I anticipate the length of the vector to be no greater than 1.5*10^8, and understand this comes to ~1.4 GiB (thanks @dyp).

I suppose the question is rather, how to manage vector capacity() such that reallocation through reserve() is kept to a minimum. I am also unsure of the efficiency of shrink_to_fit() (C++11)

Following up on @NielKirk's point about std::vector<> instead of a map for the keys, with only 256 possibilities you could also consider std::array<> (or even a C-style array) for the keys.

As for the std::pair<int, struct> elements, an initial implementation had them as members of a std::vector<std::pair<int, struct>> collection, and you said

Instead of a vector, I could use a std::map, however the overhead of ~32 bytes per node also becomes very expensive for such light weight pairs.

which implies the int part of the element is unique as you did not mention std::multimap. You could take a look at Google sparsehash ( http://code.google.com/p/sparsehash/ ). From the project home page:

An extremely memory-efficient hash_map implementation. 2 bits/entry overhead! The SparseHash library contains several hash-map implementations, including implementations that optimize for space or speed.

These hashtable implementations are similar in API to SGI's hash_map class and the tr1 unordered_map class, but with different performance characteristics. It's easy to replace hash_map or unordered_map by sparse_hash_map or dense_hash_map in C++ code.

I've used it before, and never had a problem with it. Your uint8_t keys could index into a (std::vector/std::array/C-array) collection KCH of hashmaps. If you wanted to you could even define KCH as collection of objects, each containing a hashmap, so each KCH[i] can implement a convenient interface for working with std::pair<int, struct> objects for that key. You'd have a "bad key" element as a default for non-key elements in the collection referencing either a) a single empty dummy hashmap or b) a "bad key object" that handles an unexpected key value appropriately.

Something like this:

typedef std::pair<int, struct>                            myPair;
typedef google::sparse_hash_map<int, myPair>              myCollectionType;
typedef google::sparse_hash_map<int, myPair>::iterator    myCollectionIter;

myCollectionType dummyHashMap;
std:array<myCollectionType, 256> keyedArray; 

Initialize all keyedArray elements to dummyHashMap , then fill in with different hash maps for valid keys.

Similarly, with containing objects:

class KeyedCollectionHandler {
public:
    virtual bool whatever(parm);
    ...

private:
    myCollectionType collection;
};

class BadKeyHandler : public KeyedCollectionHandler 
{
public:
    virtual bool whatever(parm){
        // unknown or unexpected key, handle appropriately
    }
    ...
};

BadKeyHandler badKeyHandler;

Initialize 256 keyed array elements to badKeyHandler , fill in KeyedCollectionHandler objects for good key values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM