简体   繁体   English

从c调用cuda导致错误

[英]call cuda from c results in errors

The errors I am receiving are: 我收到的错误是:

'blockIdx' was not declared in this scope expected primary-expression before '<' token 在此范围内,未在“ <”令牌之前的预期主表达式中声明“ blockIdx”

expected primary-expression before '>' token '>'标记之前的预期主要表达式

expected primary-expression before '<' token '<'标记之前的预期主要表达式

expected primary-expression before '>' token '>'标记之前的预期主要表达式

(the "<,>" refers to the kernel call <<<>>>) (“ <,>”是指内核调用<<< >>>)

Also,in main function I receive: 另外,在主要功能中,我收到:

error: cannot convert 'float* ' to 'float ' for argument '1' to 'void kernel_wrapper(float*, float*, int, int) 错误:无法将参数'1'的'float * '转换为'float '到'void kernel_wrapper(float *,float *,int,int)

cu file: cu文件:

#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <math.h>
#include <curand_kernel.h>
#include <cuda_runtime.h>
#include <cuda.h>

.....
__global__ void kernel(float* A,float *B, curandState* globalState, int Asize,int Bsize)
{
...

void kernel_wrapper(float* A_host,float* B_host, int Asize ,int Bsize)
{
...
//allocate host memory 
    A_host=(float*)malloc(Asize*sizeof(float));
    B_host=(float*)malloc(Bsize*sizeof(float));

    //allocate device memory
    float* A_dev,*B_dev;
    cudaMalloc((void**) &A_dev,Asize* sizeof(float));
    cudaMalloc((void**) &B_dev,Bsize* sizeof(float));
....

 kernel<<<1,1>>>(A_host,B_host, devStates,Asize,Bsize);
...

c file: C文件:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <string.h>
#include <assert.h>
#include <stdarg.h>
#include <cuda.h>
#include <cuda_runtime.h>
#include "solve.cu"


extern void kernel_wrapper(float* A,float* B, int Asize ,int Bsize);
...
int main()
{...
A = (float*)malloc(N*N*sizeof(float));
B = (float*)malloc(N*HS*sizeof(float));
...
kernel_wrapper(A,B,Asize ,Bsize);
...

I am compiling as: 我编译为:

 g++ -o mycode myfile.c -I/usr/local/cuda-5.5/include -L/usr/local/cuda-5.5/lib64 -lcurand -lcutil -lcudpp -lcuda -lstdc+

You can't include solve.cu which contains device code (eg kernels) in a .c file and then compile it properly with g++ 您不能在.c文件中包含包含设备代码(例如内核)的solve.cu ,然后使用g++对其进行正确编译

Device code has to be compiled by nvcc 设备代码必须由nvcc编译

Instead, you will need to compile the two files separately, then link them together. 相反,您将需要分别编译两个文件,然后将它们链接在一起。

I would suggest renaming your myfile.c to myfile.cpp 我建议将您的myfile.c重命名为myfile.cpp

Also remove this line from your myfile.cpp : 还要从myfile.cpp 删除这一行:

#include "solve.cu"

Then compile with: 然后编译:

nvcc -c solve.cu
g++ -c -I/usr/local/cuda-5.5/include  myfile.cpp 
g++ -o mycode solve.o myfile.o -L/usr/local/cuda-5.5/lib64 -lcudart -lcurand -lcutil -lcudpp -lcuda

For the last issue, you are passing double pointers ( ** ): 对于最后一个问题,您正在传递双指针( ** ):

kernel_wrapper(&A,&B,Asize ,Bsize);

Where the prototype is expecting single pointers ( * ): 原型期望单个指针( * ):

extern void kernel_wrapper(float* A,float* B, int Asize ,int Bsize);

A and B are already of type float * , so it looks to me like you should pass them directly: AB 已经float *类型的,因此在我看来,您应该直接将它们传递给:

kernel_wrapper(A,B,Asize ,Bsize);

EDIT: Responding to a question below. 编辑:回应以下问题。

The problem is the pointers A_host and B_host (parameters to kernel_wrapper ) are being passed by value to the kernel wrapper function, and the kernel wrapper function is allocating the storage for those pointers, but the newly modified pointer reflecting the allocated storage is not (cannot) be passed back to the calling function (ie the function that called kernel_wrapper ). 的问题是,指针A_hostB_host (参数kernel_wrapper 正在值内核包装函数通过),和内核包装函数是为那些指针分配存储,但是新修改的指针反映分配的存储不是(不能)传递回调用函数(即调用kernel_wrapper的函数)。

You could allocate the storage for A_host and B_host in the calling function, and then pass the pointer (and then no need to malloc those pointers in kernel_wrapper ) or you could modify the kernel wrapper as follows: 您可以在调用函数中为A_hostB_host分配存储,然后传递指针(然后无需在kernel_wrapper malloc这些指针),也可以按如下方式修改内核包装器:

void kernel_wrapper(float** A_host,float** B_host, int Asize ,int Bsize)
{
...
//allocate host memory 
    *A_host=(float*)malloc(Asize*sizeof(float));
    *B_host=(float*)malloc(Bsize*sizeof(float));

    //allocate device memory
    float* A_dev,*B_dev;
    cudaMalloc((void**) &A_dev,Asize* sizeof(float));
    cudaMalloc((void**) &B_dev,Bsize* sizeof(float));
....
 cudaMemcpy(A_dev, *A_host, Asize*sizeof(float), cudaMemcpyHostToDevice);
 cudaMemcpy(B_dev, *B_host, Bsize*sizeof(float), cudaMemcpyHostToDevice);

 kernel<<<1,1>>>(A_dev,B_dev, devStates,Asize,Bsize);
...

You would then also need to modify your calling line in the .cpp file: 然后,您还需要修改.cpp文件中的呼叫行:

int main()
{...
  float *A, *B;
  int Asize = N*N;
  int Bsize = N*NHS;
...
  kernel_wrapper(&A,&B,Asize ,Bsize);
...

The way your code is posted now, you are doing a malloc operation twice each for A and B and that is not necessary. 现在,您的代码发布方式是,您对AB分别执行两次malloc操作,这不是必需的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM