如何獲取多維向量使用的memory

Question

我目前正在編寫一些代碼來創建一個 neural.network，並且我正在嘗試使其盡可能優化。 我希望能夠獲得網絡類型 object 消耗的 memory 的數量，因為 memory 的使用對於避免緩存未命中非常重要。 我嘗試使用 sizeof()，但這不起作用，因為我假設向量將值存儲在堆上，因此 sizeof() function 只會告訴我指針的大小。 到目前為止，這是我的代碼。

#include <iostream>
#include <vector>
#include <random>
#include <chrono>

class Timer
{
private:
    std::chrono::time_point<std::chrono::high_resolution_clock> start_time;
public:
    Timer(bool auto_start=true)
    {
        if (auto_start)
        {
            start();
        }
    }
    void start()
    {
        start_time = std::chrono::high_resolution_clock::now();
    }
    float get_duration()
    {
        std::chrono::duration<float> duration = std::chrono::high_resolution_clock::now() - start_time;
        return duration.count();
    }
};

class Network
{
public:
    std::vector<std::vector<std::vector<float>>> weights;
    std::vector<std::vector<std::vector<float>>> deriv_weights;
    std::vector<std::vector<float>> biases;
    std::vector<std::vector<float>> deriv_biases;
    std::vector<std::vector<float>> activations;
    std::vector<std::vector<float>> deriv_activations;
};

Network create_network(std::vector<int> layers)
{
    Network network;
    network.weights.reserve(layers.size() - 1);
    int nodes_in_prev_layer = layers[0];
    for (unsigned int i = 0; i < layers.size() - 1; ++i)
    {
        int nodes_in_layer = layers[i + 1];
        network.weights.push_back(std::vector<std::vector<float>>());
        network.weights[i].reserve(nodes_in_layer);
        for (int j = 0; j < nodes_in_layer; ++j)
        {
            network.weights[i].push_back(std::vector<float>());
            network.weights[i][j].reserve(nodes_in_prev_layer);
            for (int k = 0; k < nodes_in_prev_layer; ++k)
            {
                float input_weight = float(std::rand()) / RAND_MAX;
                network.weights[i][j].push_back(input_weight);
            }
        }
        nodes_in_prev_layer = nodes_in_layer;
    }
    return network;
}

int main() 
{
    Timer timer;
    Network network = create_network({784, 800, 16, 10});
    std::cout << timer.get_duration() << std::endl;
    std::cout << sizeof(network) << std::endl;
    std::cin.get();
}

Answer 1

我最近將我們的生產 neural.network 代碼更新為 AVX-512； 它絕對是真實世界的生產代碼。 我們優化的一個關鍵部分是每個矩陣不是std::vector ，而是一維 AVX 對齊數組。 即使沒有 AVX alignment，我們也看到了轉向支持每個矩陣的一維數組的巨大好處。 這意味着 memory 訪問將是完全順序的，這要快得多。 然后大小將是(rows*cols)*sizeof(float) 。

我們將偏差存儲為第一個完整行。 通常這是通過在輸入前加上1.0元素來實現的，但對於我們的 AVX 代碼，我們使用偏差作為 FMA（融合乘加）操作的起始值。 即偽代碼result=bias; for(input:inputs) result+=(input*weight) result=bias; for(input:inputs) result+=(input*weight) 。 這使輸入也保持 AVX 對齊。

由於依次使用每個矩陣，因此您可以安全地擁有一個std::vector<Matrix> layers 。

Answer 2

正如來自https://stackoverflow.com/a/17254518/7588455的引述：
Vector 將其元素存儲在內部分配的 memory 數組中。 你可以這樣做：

sizeof(std::vector<int>) + (sizeof(int) * MyVector.size())

這將為您提供向量結構本身的大小加上其中所有整數的大小，但它可能不包括您的 memory 分配器可能強加的任何小開銷。 我不確定是否有獨立於平台的方式來包含它。

在您的情況下，只有實際內部分配的 memory 數組很重要，因為您只是在訪問這些數組。 另請注意您是如何訪問 memory 的。
為了編寫對緩存友好的代碼，我強烈建議通讀這篇 SO 帖子： https://stackoverflow.com/a/16699282/7588455

如何獲取多維向量使用的memory

問題描述

2 個解決方案

解決方案1
1 2020-04-14 12:29:36

解決方案2
-1 2020-04-14 12:14:52

如何獲取多維向量使用的memory

問題描述

2 個解決方案

解決方案1 1 2020-04-14 12:29:36

解決方案2 -1 2020-04-14 12:14:52

解決方案1
1 2020-04-14 12:29:36

解決方案2
-1 2020-04-14 12:14:52