简体   繁体   中英

Calculating size of vector of vectors in bytes

typedef vector<vector<short>> Mshort;
typedef vector<vector<int>> Mint;

Mshort mshort(1 << 20, vector<short>(20, -1)); // Xcode shows 73MB 
Mint mint(1 << 20, vector<int>(20, -1)); // Xcode shows 105MB

short uses 2 bytes and int 4 bytes; please note that 1 << 20 = 2^20 ;

I am trying to calculate ahead (on paper) usage of memory but I am unable to.

sizeof(vector<>) // = 24 //no matter what type
sizeof(int) // = 4
sizeof(short) // = 2

I do not understand: mint should be double the mshort but it isn't. When running program only with mshort initialisation Xcode shows 73MB of memory usage; for mint 105MB ;

mshort.size() * mshort[0].size() * sizeof(short) * sizeof(vector<short>) // = 1006632960
mint.size() * min[0].size() * sizeof(int) * sizeof(vector<int>) // = 2013265920

//no need to use .capacity() because I fill vectors with -1
1006632960 * 2 = 2013265920

How does one calculate how much space of RAM will 2d std::vector use or 2d std::array use.

I know the sizes ahead and each row has same number of columns.

The memory usage of your vectors of vectors will be eg

// the size of the data...
mshort.size() * mshort[0].size() * sizeof(short) +

// the size of the inner vector objects...
mshort.size() * sizeof mshort[0] +

// the size of the outer vector object...
// (this is ostensibly on the stack, given your code)
sizeof mshort +

// dynamic allocation overheads
overheads

The dynamic allocation overheads are because the vector s internally new memory for the elements they're to store, and for speed reasons they may have pools of fixed-sized memory areas waiting for new requests, so if the vector effectively does a new short[20] - with the data needing 40 bytes - it might end up with eg 48 or 64. The implementation may actually need to use some extra memory to store the array size, though for short and int there's no need to loop over the elements invoking destructors during delete[] , so a good implementation will avoid that allocation and no-op destruction behaviour.

The actual data elements for any given vector are contiguous in memory though, so if you want to reduce the overheads, you can change your code to use fewer, larger vector s. For example, using one vector with (1 << 20) * 20 will have negligible overhead - then rather than accessing [i][j] you can access [i * 20 + j] - you can write a simple class wrapping the vector to do this for you, most simply with a v(i, j) notation...

inline short& operator()(size_t i, size_t j) { return v_[i * 20 + j]; }
inline short operator()(size_t i, size_t j) const { return v_[i * 20 + j]; }

...though you could support v[i][j] by having v.operator[] return a proxy object that can be further indexed with [] . I'm sure if you search SO for questions on multi-dimension arrays there'll be some examples - think I may have posted such code myself once.

The main reason to want vector<vector<x>> is when the inner vector s vary in length.

Assuming glibc malloc: Each memory chunk will allocate additional 8-16 bytes(2 size_t) for memory block header. For 64 bit system it would be 16 bytes. see code: https://github.com/sploitfun/lsploits/blob/master/glibc/malloc/malloc.c#L1110

chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |             Size of previous chunk, if allocated            | |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |             Size of chunk, in bytes                       |M|P|
  mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |             User data starts here...                          .
    .                                                               .
    .             (malloc_usable_size() bytes)                      .
    .                                                               |

It gives me approximately 83886080 for short when adding 16 bytes per row.

26+16+ mshort.size(1048576) * (mshort[0].size(20)*sizeof(short(2)) + sizeof(vector(26))+header(16))

It gives me approximately 125829120 for int.

But then I recompute you numbers and it look like you are on 32 bit...

  • short 75497472 that is ~73M
  • long 117440512 that is ~112M

Looks very close to reported ones.

Use capacity not size to get #items number, even if those are the same in your case.

Allocating single vector size row*columns will save you header*1048576 bytes.

Your calculation mshort.size() * mshort[0].size() * sizeof(short) * sizeof(vector<short>) // = 1006632960 is simply wrong. As your calculation, mshort takes 1006632960 which is 960MiB, which is not true.

Let's ignore libc's overhead, and just focus on std::vector<> 's size: mshort is a vector of 1^20 items, each is vector<short> with 20 items. So the size shall be:

mshort.size() * mshort[0].size() * sizeof(short) // Size of all short values + mshort.size() * sizeof(vector<short>) // Size of 1^20 vector<short> + sizeof(mshort) // Size of mshort itself, which can be ignored as overhead

The calculated size is 64MiB .

The same to mint, where the calculated size is 104MiB .

So mint is simply NOT double size of mshort .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM