简体   繁体   中英

How to change sub-matrix of a sparse matrix on CUDA device

I have a sparse matrix structure that I am using in conjunction with CUBLAS to implement a linear solver class. I anticipate that the dimensions of the sparse matrices I will be solving will be fairly large (on the order of 10^7 by 10^7). I will also anticipate that the solver will need to be used many times and that a portion of this matrix will need be updated several times (between computing solutions) as well.

Copying the entire matrix sturcture from system memory to device memory could become quite a performance bottle neck since only a fraction of the matrix entries will ever need to be changed at a given time.

What I would like to be able to do is to have a way to update only a particular sub-set / sub-matrix rather than recopy the entire matrix structure from system memory to device memory each time I need to change the matrix.

The matrix data structure would reside on the CUDA device in arrays: d_col, d_row, and d_val

On the system side I would have corresponding arrays I, J, and val.

So ideally, I would only want to change the subsets of d_val that correspond to the values in the system array, val, that changed.

Note that I do not anticipate that any entries will be added to or removed from the matrix, only that existing entries will change in value.

Naively I would think that to implement this, I would have an integer array or vector on the host side, eg updateInds , that would track the indices of entries in val that have changed, but I'm not sure how to efficiently tell the CUDA device to update the corresponding values of d_val.

In essence: how do I change the entries in a CUDA device side array (d_val) at indicies updateInds[1],updateInds[2],...,updateInds[n] to a new set of values val[updatInds[1]], val[updateInds[2]], ..., val[updateInds[3]], with out recopying the entire val array from system memory into CUDA device memory array d_val?

As long as you only want to change the numerical values of the value array associated with CSR (or CSC, or COO) sparse matrix representation, the process is not complicated.

Suppose I have code like this (excerpted from the CUDA conjugate gradient sample ):

checkCudaErrors(cudaMalloc((void **)&d_val, nz*sizeof(float)));
...
cudaMemcpy(d_val, val, nz*sizeof(float), cudaMemcpyHostToDevice);

Now, subsequent to this point in the code, let's suppose I need to change some values in the d_val array, corresponding to changes I have made in val :

for (int i = 10; i < 25; i++)
  val[i] = 4.0f;

The process to move these particular changes is conceptually the same as if you were updating an array using memcpy , but we will use cudaMemcpy to update the d_val array on the device:

cudaMemcpy(d_val+10, val+10, 15*sizeof(float), cudaMempcyHostToDevice);

Since these values were all contiguous, I can use a single cudaMemcpy call to effect the transfer.

If I have several disjoint regions similar to above, it will require several calls to cudaMemcpy , one for each region. If, by chance, the regions are equally spaced and of equal length:

for (int i = 10; i < 5; i++)
  val[i] = 1.0f;
for (int i = 20; i < 5; i++)
  val[i] = 2.0f;
for (int i = 30; i < 5; i++)
  val[i] = 4.0f;

then it would also be possible to perform this transfer using a single call to cudaMemcpy2D . The method is outlined here .

Notes:

  1. cudaMemcpy2D is slower than you might expect compared to a cudaMemcpy operation on the same number of elements .
  2. CUDA API calls have some inherent overhead. If a large part of the matrix is to be updated in a scattered fashion, it may still be actually quicker to just transfer the whole d_val array, taking advantage of the fact that this can be done using a single cudaMemcpy operation.
  3. The method described here cannot be used if non-zero values change their location in the sparse matrix. In that case, I cannot provide a general answer for how to surgically update a CSR sparse matrix on the device. And certain relatively simple changes could necessitate updating most of the array data (3 vectors) anyway.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM