[英]call cuda from c results in errors
The errors I am receiving are: 我收到的错误是:
'blockIdx' was not declared in this scope expected primary-expression before '<' token 在此范围内,未在“ <”令牌之前的预期主表达式中声明“ blockIdx”
expected primary-expression before '>' token '>'标记之前的预期主要表达式
expected primary-expression before '<' token '<'标记之前的预期主要表达式
expected primary-expression before '>' token '>'标记之前的预期主要表达式
(the "<,>" refers to the kernel call <<<>>>) (“ <,>”是指内核调用<<< >>>)
Also,in main function I receive: 另外,在主要功能中,我收到:
error: cannot convert 'float* ' to 'float ' for argument '1' to 'void kernel_wrapper(float*, float*, int, int) 错误:无法将参数'1'的'float * '转换为'float '到'void kernel_wrapper(float *,float *,int,int)
cu file: cu文件:
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <math.h>
#include <curand_kernel.h>
#include <cuda_runtime.h>
#include <cuda.h>
.....
__global__ void kernel(float* A,float *B, curandState* globalState, int Asize,int Bsize)
{
...
void kernel_wrapper(float* A_host,float* B_host, int Asize ,int Bsize)
{
...
//allocate host memory
A_host=(float*)malloc(Asize*sizeof(float));
B_host=(float*)malloc(Bsize*sizeof(float));
//allocate device memory
float* A_dev,*B_dev;
cudaMalloc((void**) &A_dev,Asize* sizeof(float));
cudaMalloc((void**) &B_dev,Bsize* sizeof(float));
....
kernel<<<1,1>>>(A_host,B_host, devStates,Asize,Bsize);
...
c file: C文件:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <string.h>
#include <assert.h>
#include <stdarg.h>
#include <cuda.h>
#include <cuda_runtime.h>
#include "solve.cu"
extern void kernel_wrapper(float* A,float* B, int Asize ,int Bsize);
...
int main()
{...
A = (float*)malloc(N*N*sizeof(float));
B = (float*)malloc(N*HS*sizeof(float));
...
kernel_wrapper(A,B,Asize ,Bsize);
...
I am compiling as: 我编译为:
g++ -o mycode myfile.c -I/usr/local/cuda-5.5/include -L/usr/local/cuda-5.5/lib64 -lcurand -lcutil -lcudpp -lcuda -lstdc+
You can't include solve.cu
which contains device code (eg kernels) in a .c
file and then compile it properly with g++
您不能在.c
文件中包含包含设备代码(例如内核)的solve.cu
,然后使用g++
对其进行正确编译
Device code has to be compiled by nvcc
设备代码必须由nvcc
编译
Instead, you will need to compile the two files separately, then link them together. 相反,您将需要分别编译两个文件,然后将它们链接在一起。
I would suggest renaming your myfile.c
to myfile.cpp
我建议将您的myfile.c
重命名为myfile.cpp
Also remove this line from your myfile.cpp
: 还要从myfile.cpp
删除这一行:
#include "solve.cu"
Then compile with: 然后编译:
nvcc -c solve.cu
g++ -c -I/usr/local/cuda-5.5/include myfile.cpp
g++ -o mycode solve.o myfile.o -L/usr/local/cuda-5.5/lib64 -lcudart -lcurand -lcutil -lcudpp -lcuda
For the last issue, you are passing double pointers ( **
): 对于最后一个问题,您正在传递双指针( **
):
kernel_wrapper(&A,&B,Asize ,Bsize);
Where the prototype is expecting single pointers ( *
): 原型期望单个指针( *
):
extern void kernel_wrapper(float* A,float* B, int Asize ,int Bsize);
A
and B
are already of type float *
, so it looks to me like you should pass them directly: A
和B
已经是float *
类型的,因此在我看来,您应该直接将它们传递给:
kernel_wrapper(A,B,Asize ,Bsize);
EDIT: Responding to a question below. 编辑:回应以下问题。
The problem is the pointers A_host
and B_host
(parameters to kernel_wrapper
) are being passed by value to the kernel wrapper function, and the kernel wrapper function is allocating the storage for those pointers, but the newly modified pointer reflecting the allocated storage is not (cannot) be passed back to the calling function (ie the function that called kernel_wrapper
). 的问题是,指针A_host
和B_host
(参数kernel_wrapper
正在值内核包装函数通过),和内核包装函数是为那些指针分配存储,但是新修改的指针反映分配的存储不是(不能)传递回调用函数(即调用kernel_wrapper
的函数)。
You could allocate the storage for A_host
and B_host
in the calling function, and then pass the pointer (and then no need to malloc
those pointers in kernel_wrapper
) or you could modify the kernel wrapper as follows: 您可以在调用函数中为A_host
和B_host
分配存储,然后传递指针(然后无需在kernel_wrapper
malloc
这些指针),也可以按如下方式修改内核包装器:
void kernel_wrapper(float** A_host,float** B_host, int Asize ,int Bsize)
{
...
//allocate host memory
*A_host=(float*)malloc(Asize*sizeof(float));
*B_host=(float*)malloc(Bsize*sizeof(float));
//allocate device memory
float* A_dev,*B_dev;
cudaMalloc((void**) &A_dev,Asize* sizeof(float));
cudaMalloc((void**) &B_dev,Bsize* sizeof(float));
....
cudaMemcpy(A_dev, *A_host, Asize*sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(B_dev, *B_host, Bsize*sizeof(float), cudaMemcpyHostToDevice);
kernel<<<1,1>>>(A_dev,B_dev, devStates,Asize,Bsize);
...
You would then also need to modify your calling line in the .cpp file: 然后,您还需要修改.cpp文件中的呼叫行:
int main()
{...
float *A, *B;
int Asize = N*N;
int Bsize = N*NHS;
...
kernel_wrapper(&A,&B,Asize ,Bsize);
...
The way your code is posted now, you are doing a malloc
operation twice each for A
and B
and that is not necessary. 现在,您的代码发布方式是,您对A
和B
分别执行两次malloc
操作,这不是必需的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.