從傳遞的FORTRAN數組中生成CUSP coo_matrix

Question

我正在將CUSP求解器集成到現有的FORTRAN代碼中。 第一步，我只是嘗試傳遞一對整數數組和一個來自FORTRAN的浮點數（在FORTRAN中為real * 4），該浮點數將用於構造並打印COO格式的CUSP矩陣。

到目前為止，我已經能夠跟蹤該線程並進行所有編譯和鏈接：使用IFORT和nvcc和CUSP的未解析引用

不幸的是，該程序顯然正在將垃圾發送到CUSP矩陣中，並由於以下錯誤而崩潰：

$./fort_cusp_test
 testing 1 2 3
sparse matrix <1339222572, 1339222572> with 1339222568 entries
libc++abi.dylib: terminating with uncaught exception of type thrust::system::system_error: invalid argument

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x10ff86ff6
#1  0x10ff86593
#2  0x7fff8593ff19
Abort trap: 6

用於cuda和fortran源的代碼如下：

cusp_runner.cu

#include <stdio.h>
#include <cusp/coo_matrix.h>
#include <iostream>
#include <cusp/krylov/cg.h>
#include <cusp/print.h>

#if defined(__cplusplus)
extern "C" {
#endif

void test_coo_mat_print_(int * row_i, int * col_j, float * val_v, int n, int nnz ) {

   //wrap raw input pointers with thrust::device_ptr
   thrust::device_ptr<int> wrapped_device_I(row_i);
   thrust::device_ptr<int> wrapped_device_J(col_j);
   thrust::device_ptr<float> wrapped_device_V(val_v);

   //use array1d_view to wrap individual arrays
   typedef typename cusp::array1d_view< thrust::device_ptr<int> > DeviceIndexArrayView;
   typedef typename cusp::array1d_view< thrust::device_ptr<float> > DeviceValueArrayView;

   DeviceIndexArrayView row_indices(wrapped_device_I, wrapped_device_I + n);
   DeviceIndexArrayView column_indices(wrapped_device_J, wrapped_device_J + nnz);
   DeviceValueArrayView values(wrapped_device_V, wrapped_device_V + nnz);

   //combine array1d_views into coo_matrix_view
   typedef   cusp::coo_matrix_view<DeviceIndexArrayView,DeviceIndexArrayView,DeviceValueArrayView> DeviceView;

   //construct coo_matrix_view from array1d_views
   DeviceView A(n,n,nnz,row_indices,column_indices,values);

   cusp::print(A);
}
#if defined(__cplusplus)
}
#endif

fort_cusp_test.f90

program fort_cuda_test

   implicit none

interface
   subroutine test_coo_mat_print_(row_i,col_j,val_v,n,nnz) bind(C)
      use, intrinsic :: ISO_C_BINDING, ONLY: C_INT,C_FLOAT
      implicit none
      integer(C_INT) :: n, nnz, row_i(:), col_j(:)
      real(C_FLOAT) :: val_v(:)
   end subroutine test_coo_mat_print_
end interface

   integer*4   n
   integer*4   nnz

   integer*4, target :: rowI(9),colJ(9)
   real*4, target :: valV(9)

   integer*4, pointer ::   row_i(:)
   integer*4, pointer ::   col_j(:)
   real*4, pointer ::   val_v(:)

   n     =  3
   nnz   =  9
   rowI =  (/ 1, 1, 1, 2, 2, 2, 3, 3, 3/)
   colJ =  (/ 1, 2, 3, 1, 2, 3, 1, 2, 3/)
   valV =  (/ 1, 2, 3, 4, 5, 6, 7, 8, 9/)

   row_i => rowI
   col_j => colJ
   val_v => valV

   write(*,*) "testing 1 2 3"

   call test_coo_mat_print_(row_i,col_j,val_v,n,nnz)

end program fort_cuda_test

如果您想自己嘗試，這是我的（不太優雅的）makefile：

Test:
   nvcc -Xcompiler="-fPIC" -shared cusp_runner.cu -o cusp_runner.so -I/Developer/NVIDIA/CUDA-6.5/include/cusp
   gfortran -c fort_cusp_test.f90
   gfortran fort_cusp_test.o cusp_runner.so -L/Developer/NVIDIA/CUDA-6.5/lib -lcudart -o fort_cusp_test

clean:
   rm *.o *.so

當然，需要更改庫路徑。

誰能為我指出正確的方向，以便如何正確地從fortran代碼中傳遞所需的數組？

刪除接口塊並在C函數的開頭添加一條print語句，我可以看到數組傳遞正確，但是n和nnz引起了問題。 我得到以下輸出：

$ ./fort_cusp_test
 testing 1 2 3
n: 1509677596, nnz: 1509677592
     i,  row_i,  col_j,        val_v
     0,      1,      1,   1.0000e+00
     1,      1,      2,   2.0000e+00
     2,      1,      3,   3.0000e+00
     3,      2,      1,   4.0000e+00
     4,      2,      2,   5.0000e+00
     5,      2,      3,   6.0000e+00
     6,      3,      1,   7.0000e+00
     7,      3,      2,   8.0000e+00
     8,      3,      3,   9.0000e+00
     9,      0,  32727,   0.0000e+00
    ...
    etc
    ...
    Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x105ce7ff6
#1  0x105ce7593
#2  0x7fff8593ff19
#3  0x105c780a2
#4  0x105c42dbc
#5  0x105c42df4
Segmentation fault: 11

fort_cusp_test

    interface
       subroutine test_coo_mat_print_(row_i,col_j,val_v,n,nnz) bind(C)
          use, intrinsic :: ISO_C_BINDING, ONLY: C_INT,C_FLOAT
          implicit none
          integer(C_INT),value :: n, nnz
          integer(C_INT) :: row_i(:), col_j(:)
          real(C_FLOAT) :: val_v(:)
       end subroutine test_coo_mat_print_
    end interface

       integer*4   n
       integer*4   nnz

       integer*4, target :: rowI(9),colJ(9)
       real*4, target :: valV(9)

       integer*4, pointer ::   row_i(:)
       integer*4, pointer ::   col_j(:)
       real*4, pointer ::   val_v(:)

       n     =  3
       nnz   =  9
       rowI =  (/ 1, 1, 1, 2, 2, 2, 3, 3, 3/)
       colJ =  (/ 1, 2, 3, 1, 2, 3, 1, 2, 3/)
       valV =  (/ 1, 2, 3, 4, 5, 6, 7, 8, 9/)

       row_i => rowI
       col_j => colJ
       val_v => valV

       write(*,*) "testing 1 2 3"

       call test_coo_mat_print_(rowI,colJ,valV,n,nnz)

    end program fort_cuda_test

cusp_runner.cu

   #include <stdio.h>
    #include <cusp/coo_matrix.h>
    #include <iostream>
    // #include <cusp/krylov/cg.h>
    #include <cusp/print.h>

    #if defined(__cplusplus)
    extern "C" {
    #endif

    void test_coo_mat_print_(int * row_i, int * col_j, float * val_v, int n, int nnz ) {

       printf("n: %d, nnz: %d\n",n,nnz);

       printf("%6s, %6s, %6s, %12s \n","i","row_i","col_j","val_v");
       for(int i=0;i<n;i++) {
          printf("%6d, %6d, %6d, %12.4e\n",i,row_i[i],col_j[i],val_v[i]);
       }
       if ( false ) {
       //wrap raw input pointers with thrust::device_ptr
       thrust::device_ptr<int> wrapped_device_I(row_i);
       thrust::device_ptr<int> wrapped_device_J(col_j);
       thrust::device_ptr<float> wrapped_device_V(val_v);

       //use array1d_view to wrap individual arrays
       typedef typename cusp::array1d_view< thrust::device_ptr<int> > DeviceIndexArrayView;
       typedef typename cusp::array1d_view< thrust::device_ptr<float> > DeviceValueArrayView;

       DeviceIndexArrayView row_indices(wrapped_device_I, wrapped_device_I + n);
       DeviceIndexArrayView column_indices(wrapped_device_J, wrapped_device_J + nnz);
       DeviceValueArrayView values(wrapped_device_V, wrapped_device_V + nnz);

       //combine array1d_views into coo_matrix_view
       typedef cusp::coo_matrix_view<DeviceIndexArrayView,DeviceIndexArrayView,DeviceValueArrayView> DeviceView;

       //construct coo_matrix_view from array1d_views
       DeviceView A(n,n,nnz,row_indices,column_indices,values);

       cusp::print(A); }
    }
    #if defined(__cplusplus)
    }
    #endif

Answer 1

有兩種方法可以將參數從Fortran傳遞給C例程：第一種方法是使用接口塊（現代Fortran中的新方法），第二種方法是不使用接口塊（即使對於Fortran77也有效的舊方法）。

首先，以下是有關使用接口塊的第一種方法。 因為C例程希望接收C指針（row_i，col_j和val_v），所以我們需要從Fortran端傳遞這些變量的地址。 為此，我們必須在接口塊中使用星號（*）而非冒號（:)，如下所示。 （如果使用冒號，則這將告訴Fortran編譯器發送Fortran指針對象[1]的地址，這不是期望的行為。）而且，由於C例程中的n和nnz被聲明為值（不是指針）。，接口塊需要具有這些變量的VALUE屬性，以便Fortran編譯器發送n和nnz的值，而不是它們的地址。 總而言之，在第一種方法中，C和Fortran例程如下所示：

Fortran routine:
...
interface
    subroutine test_coo_mat_print_(row_i,col_j,val_v,n,nnz) bind(C)
        use, intrinsic :: ISO_C_BINDING, ONLY: C_INT,C_FLOAT
        implicit none
        integer(C_INT) :: row_i(*), col_j(*)
        real(C_FLOAT) :: val_v(*)
        integer(C_INT), value :: n, nnz     !! see note [2] below also
    end subroutine test_coo_mat_print_
end interface
...
call test_coo_mat_print_( rowI, colJ, valV, n, nnz )

C routine:
void test_coo_mat_print_ (int * row_i, int * col_j, float * val_v, int n, int nnz )

以下是關於不帶接口塊的第二種方法。 在這種方法中，首先完全刪除接口塊和數組指針，然后按如下所示更改Fortran代碼

Fortran routine:

integer  rowI( 9 ), colJ( 9 ), n, nnz     !! no TARGET attribute necessary
real     valV( 9 )

! ...set rowI etc as above...

call test_coo_mat_print ( rowI, colJ, valV, n, nnz )   !! "_" is dropped

和C例程如下

void test_coo_mat_print_ ( int* row_i, int* col_j, float* val_v, int* n_, int* nnz_ )
{
    int n = *n_, nnz = *nnz_;

    printf( "%d %d \n", n, nnz );
    for( int k = 0; k < 9; k++ ) {
        printf( "%d %d %10.6f \n", row_i[ k ], col_j[ k ], val_v[ k ] );
    }

    // now go to thrust...
}

請注意，n_和nnz_在C例程中被聲明為指針，因為沒有接口塊，Fortran編譯器總是將實際參數的地址發送到C例程。 另請注意，在上述C例程中，將打印row_i等的內容，以確保正確傳遞參數。 如果打印的值正確，那么我猜想在推力例程的調用中更可能出現問題（包括如何傳遞n和nnz之類的尺寸信息）。

[1]聲明為“ real，pointer :: a（:)”的Fortran指針實際上表示類似於數組視圖類（在C ++術語中），與所指向的實際數據不同。 這里需要的是發送實際數據的地址，而不是此數組視圖對象的地址。 此外，接口塊（a（*））中的星號表示假定大小的數組，這是在Fortran中傳遞數組的一種舊方法。 在這種情況下，按預期方式傳遞了數組第一個元素的地址。

[2]如果n和NNZ中被聲明為C例程的指針（如在第二種方法中），則此值屬性不應該被附接，因為C例程希望的實際參數，而不是它們的值的地址。

從傳遞的FORTRAN數組中生成CUSP coo_matrix

問題描述

1 個解決方案

解決方案1
2 已采納 2015-08-15 02:34:15

從傳遞的FORTRAN數組中生成CUSP coo_matrix

問題描述

1 個解決方案

解決方案1 2 已采納 2015-08-15 02:34:15

解決方案1
2 已采納 2015-08-15 02:34:15