简体   繁体   中英

Memory usage of large 2D static array and vector of vectors

I need to use large matrix 20000 * 20000 for a machine learning project. When it is implemented as static array, it uses approximately 1.57 GB of memory. If it is implemented with vector of vectors it uses much more memory then the static array (approximately 3.06 GB). I could not figure out the reason behind it.

Array version:

static double distanceMatrix[20000][20000] = {0};

Vector of vectors:

vector<vector<double>> distanceMatrix(20000, vector<double> (20000));

I used them to store the distances between points.

for (int i = 0; i < 20000; i++){
    for (int j = i+1; j < 20000; j++)
        distanceMatrix[i][j] = euclid_dist(pointVec[i], pointVec[j]);
}

I also observed that when I used array version memory usage increases gradually during the nested loop. However, while using vector of vectors, memory usage reaches 3.06 GB then nested loop starts.

I checked the memory usage with Xcode debug navigator and Activity Monitor. Thanks in advance!

这是因为vector的内存分配策略,当达到其限制(依赖于实现)时可能是newsize = oldsize * const,另请参阅向量内存分配策略

First of all the array doesn't take 1,57 GB of memory. So there is an issue with the measurement.

Experiment with the static array

When running the following code in Xcode, you'll find out that the array is exactly 3,2 Gb in size:

const size_t matsize=20000;
static double mat2D[matsize][matsize] = {0};  
cout<<"Double:             " << sizeof (double) <<endl;
cout<<"Array:              " << sizeof mat2D <<endl;
cout<<"                    " << sizeof(double)*matsize*matsize<<endl;
// ... then your loop

When the programme starts, its reported memory consumption is only 5,3MB before entering into the loop, although the static array is already there. Once the loop finished, the reported memory consumption is 1,57 Gb as you explained. But still not the 3,2Gb that we could expect.

The memory consumption figure that you read is the physical memory used by your process. The remaining memory of the process is in the virtual memory, which is much larger (7 Gb during my experiment).

Experiment of the vector

First, let's look at the approximate memory size of the vector, knowing that each vector has a fixed size plus a dynamically allocated variable size (based on the capacity which can be equal or greater than the number of elements actually stored in the vector). The following code can give you some estimates:

vector<vector<double>> mat2D(matsize, vector<double> (matsize));
cout<<"Vector (fixed part):" << sizeof (vector<double>)<<endl;
cout<<"Vector capacity:   " << mat2D[0].capacity()<<endl;
cout<<"       (variable):  " << sizeof(double)*mat2D[0].capacity()<<endl;
cout<<"       (total):     " << sizeof(double)*mat2D[0].capacity() + sizeof(mat2D[0])<<endl;
cout<<"Vector of vector:   " << (sizeof(double)*mat2D[0].capacity() + sizeof(mat2D[0]))*matsize+sizeof(mat2D)<<endl;
// then your loop 

Running this code will show that the memory required to store your vector of vector is about 3,2 Gb + 480 Kb overhead (24 bytes per vector * 2001 vectors).

Before entering the loop, we will notice that already 3 Gb of physical memory is used. So MacOS uses twice the physical memory for this dynamic version compared to the array version. This is certainly because the allocation process requires to really access more memory upfront: each of the 2000 lines requires a separate allocation and initialization.

Conclusion

The overall virtual memory used in the two approaches is not that different. I could measure around 7Gb running it in debug mode with Xcode. The vector variant uses a little bit more than previously due to the extra 480Kb overhead for vectors.

The strategy used by macOS for using more or less physical memory is complex and may depend on many factors (including the access patterns), the goal being to find the best tradeoff between physical memory available and cost of swapping.

But the physical memory usage is not representative of the memory consumption by the process.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM