简体   繁体   English

CUDA CUSTP线性求解器中的内存使用

[英]Memory use in cuda cusp linear solver

I am using cusp::bicgstab to solve a linear system Ax=b , in which A is a 3D Poisson on MxNxP mesh, x is unknowns, and b is the RHS. 我正在使用cusp :: bicgstab求解线性系统Ax = b ,其中A是MxNxP网格上的3D泊松, x是未知数, b是RHS。 I have a K40m Tesla which has 12GB memory. 我有一个拥有12GB内存的K40m Tesla

I tested with M=2000, N=2000, P=20 (80 millions unknowns), variable type is double ; 我用M = 2000,N = 2000,P = 20 (8000万个未知数)进行了测试,变量类型为double so the total memory used (for A, x, b, and others) is approximately 5.5GB . 因此使用的总内存(用于A,x,b和其他内存)约为5.5GB The code works fine. 该代码工作正常。

Then I increased the value of M or N to 2500 (memory used is still far less than 12GB), the program encountered the following error: 然后我将M或N的值增加到2500(使用的内存仍然远远小于12GB),程序遇到以下错误:

terminate called after throwing an instance of 'thrust::system::detail::bad_alloc' 抛出'thrust :: system :: detail :: bad_alloc'实例后调用终止

what(): std::bad_alloc: out of memory what():std :: bad_alloc:内存不足
Aborted (core dumped) 中止(核心已弃用)

I see that the error is " out of device memory ". 我看到错误是“ 设备内存不足 ”。 Therefore, I am wondering about the memory management in cusp library. 因此,我想知道cusp库中的内存管理。 Does it use about the same memory space for extra variables (as used for A,x,b ) during the iterations to solve the system? 在求解系统的迭代过程中,它是否为额外的变量(如A,x,b )使用大约相同的内存空间?

Below is my code: 下面是我的代码:

#include <iostream>
#include <cuda.h>
#include <cuda_runtime_api.h>

#include <cusp/monitor.h>
#include <cusp/krylov/bicgstab.h>
#include <cusp/gallery/poisson.h>
#include <cusp/print.h>

// where to perform the computation
typedef cusp::device_memory MemorySpace;

// which floating point type to use
typedef double ValueType;

int main(int argc, char **argv)
{
    size_t avail, total;                // Available and Total memory count
    int N = 2500, M = 2000, P = 20;     // Dimension

    // create a matrix for a 3D Poisson problem on a MxNxP grid
    cusp::dia_matrix<int, ValueType, MemorySpace> A;
    cusp::gallery::poisson7pt(A, N, M, P);

    // allocate storage for solution (x) and right hand side (b)
    cusp::array1d<ValueType, MemorySpace> x(N*M*P, 0.0);
    cusp::array1d<ValueType, MemorySpace> b(N*M*P, 1.0);

    // set preconditioner (identity)
    cusp::identity_operator<ValueType, MemorySpace> ID(A.num_rows, A.num_rows);

    // Set stopping criteria:
    // ... iteration_limit    = 100
    // ... relative_tolerance = 1e-9
    // ... absolute_tolerance = 0
    cusp::default_monitor <ValueType> monitor(b, 100, 1e-9);

    // solve the linear system A x = b
    cusp::krylov::bicgstab(A, x, b, monitor, ID);

    // Get device memory usage
    cudaMemGetInfo( &avail, &total );
    size_t used = total - avail;
    std::cout << "Device memory used: " << used/(1024.*1024.*1024.) << " Gb " << std::endl;

    return 0;
}

You can read the source for the bicgstab solver yourself, but it looks like there are eight temporary arrays, each with the same number of entries as rows in your matrix. 您可以自己阅读bicgstab求解器的源代码 ,但是看起来好像有八个临时数组,每个临时数组的条目数与矩阵中的行数相同。 If I have read your code correctly, that means that you would need to have at least 8 * N * M * P * sizeof(double) bytes of free GPU memory on entry to the bicgstab call for the solver to run. 如果我已正确阅读您的代码,则意味着您需要至少有8 * N * M * P * sizeof(double)个字节的可用GPU内存才能进入bicgstab调用,以运行求解器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM