CUDA, Qt creator and Mac

Question

I'm having a hard time incorporating CUDA into Qt creator.

I'm sure that the problem is coming from not having the right info in my .pro file. I have posted my current .pro file, my .cu file (DT_GPU.cu) and then the errors beneath that.

I've tried lots of combinations of .pro files taken from linux and windows but nothing quite works. Furthermore, I've never seen a Mac/CUDA .pro file, so this could be a useful source for future people hoping to get all three working together.

Thanks in advance for any help.

.pro file:

CUDA_SOURCES += ../../Source/DT_GPU/DT_GPU.cu

CUDA_DIR = "/Developer/NVIDIA/CUDA-7.5"


SYSTEM_TYPE = 64            # '32' or '64', depending on your system
CUDA_ARCH = sm_21           # Type of CUDA architecture, for example 'compute_10', 'compute_11', 'sm_10'
NVCC_OPTIONS = --use_fast_math


# include paths
INCLUDEPATH += $$CUDA_DIR/include

# library directories
QMAKE_LIBDIR += $$CUDA_DIR/lib/

CUDA_OBJECTS_DIR = ./


# Add the necessary libraries
CUDA_LIBS = -lcublas_device \
    -lcublas_static \
    -lcudadevrt \
    -lcudart_static \
    -lcufft_static \
    -lcufftw_static \
    -lculibos \
    -lcurand_static \
    -lcusolver_static \
    -lcusparse_static \
    -lnppc_static \
    -lnppi_static \
    -lnpps_static

# The following makes sure all path names (which often include spaces) are put between quotation marks
CUDA_INC = $$join(INCLUDEPATH,'" -I"','-I"','"')
LIBS += $$join(CUDA_LIBS,'.so ', '', '.so')
#LIBS += $$CUDA_LIBS

# Configuration of the Cuda compiler
CONFIG(debug, debug|release) {
    # Debug mode
    cuda_d.input = CUDA_SOURCES
    cuda_d.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
    cuda_d.commands = $$CUDA_DIR/bin/nvcc -D_DEBUG $$NVCC_OPTIONS $$CUDA_INC $$NVCC_LIBS --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
    cuda_d.dependency_type = TYPE_C
    QMAKE_EXTRA_COMPILERS += cuda_d
}
else {
    # Release mode
    cuda.input = CUDA_SOURCES
    cuda.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
    cuda.commands = $$CUDA_DIR/bin/nvcc $$NVCC_OPTIONS $$CUDA_INC $$NVCC_LIBS --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
    cuda.dependency_type = TYPE_C
    QMAKE_EXTRA_COMPILERS += cuda
}

DT_GPU.cu

#include <cuda.h>
#include <cuda_runtime.h>
#include <device_launch_parameters.h>

__global__ void zero_GPU(double *l_p_array_gpu)
{
    int i = threadIdx.x;
    printf("  %i: Hello World!\n", i);
    l_p_array_gpu[i] = 0.;
}

void zero(double *l_p_array, int a_numElements)
{
    double *l_p_array_gpu;

    int size = a_numElements * int(sizeof(double));

    cudaMalloc((void**) &l_p_array_gpu, size);

    cudaMemcpy(l_p_array_gpu, l_p_array, size, cudaMemcpyHostToDevice);

    zero_GPU<<<size,1>>>(l_p_array_gpu);

    cudaMemcpy(l_p_array, l_p_array_gpu, size, cudaMemcpyDeviceToHost);

    cudaFree(l_p_array_gpu);
}

Warnings:

Makefile:848: warning: overriding commands for target `DT_GPU_cuda.o'
Makefile:792: warning: ignoring old commands for target `DT_GPU_cuda.o'
Makefile:848: warning: overriding commands for target `DT_GPU_cuda.o'
Makefile:792: warning: ignoring old commands for target `DT_GPU_cuda.o'

Errors:

In file included from ../SimplexSphereSource.cpp:8:
../../../Source/DT_GPU/DT_GPU.cu:75:19: error: expected expression
        zero_GPU<<<size,1>>>(l_p_array_gpu);
                  ^
../../../Source/DT_GPU/DT_GPU.cu:75:28: error: expected expression
        zero_GPU<<<size,1>>>(l_p_array_gpu);
                           ^
2 errors generated.
make: *** [SimplexSphereSource.o] Error 1
16:47:18: The process "/usr/bin/make" exited with code 2.
Error while building/deploying project SimplexSphereSource (kit: Desktop Qt 5.4.0 clang 64bit)
When executing step "Make"

Answer 1

I managed to get your example running with a few minor corrections to your .pro file. If you or anyone else is still interested in a larger C++/CUDA/Qt example for Mac and Linux check this answer from a few months ago. Your particular situation (or at least what you've provided) doesn't require all the extra Qt frameworks and GUI setup so the .pro file stays pretty simple.

If you haven't already done so you should make sure you have the latest CUDA Mac drivers and check that some of the basic CUDA samples compile and run. I'm currently using:

OSX Version 10.10.5
Qt 5.5.0
NVCC v7.5.17

I added a main method to the DP_GPU.cu file you provided and successfully ran the program using your .pro file with a few changes:

#CUDA_SOURCES += ../../Source/DT_GPU/DT_GPU.cu
CUDA_SOURCES += DT_GPU.cu # <-- same dir for this small example

CUDA_DIR = "/Developer/NVIDIA/CUDA-7.5"


SYSTEM_TYPE = 64            # '32' or '64', depending on your system
CUDA_ARCH = sm_21           # (tested with sm_30 on my comp) Type of CUDA architecture, for example 'compute_10', 'compute_11', 'sm_10'
NVCC_OPTIONS = --use_fast_math


# include paths
INCLUDEPATH += $$CUDA_DIR/include

# library directories
QMAKE_LIBDIR += $$CUDA_DIR/lib/

CUDA_OBJECTS_DIR = ./


# Add the necessary libraries
CUDA_LIBS = -lcudart # <-- changed this

# The following makes sure all path names (which often include spaces) are put between quotation marks
CUDA_INC = $$join(INCLUDEPATH,'" -I"','-I"','"')
#LIBS += $$join(CUDA_LIBS,'.so ', '', '.so') <-- didn't need this
LIBS += $$CUDA_LIBS # <-- needed this


# SPECIFY THE R PATH FOR NVCC (this caused me a lot of trouble before)
QMAKE_LFLAGS += -Wl,-rpath,$$CUDA_DIR/lib # <-- added this
NVCCFLAGS = -Xlinker -rpath,$$CUDA_DIR/lib # <-- and this

# Configuration of the Cuda compiler
CONFIG(debug, debug|release) {
    # Debug mode
    cuda_d.input = CUDA_SOURCES
    cuda_d.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
    cuda_d.commands = $$CUDA_DIR/bin/nvcc -D_DEBUG $$NVCC_OPTIONS $$CUDA_INC $$NVCC_LIBS --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
    cuda_d.dependency_type = TYPE_C
    QMAKE_EXTRA_COMPILERS += cuda_d
}
else {
    # Release mode
    cuda.input = CUDA_SOURCES
    cuda.output = $$CUDA_OBJECTS_DIR/${QMAKE_FILE_BASE}_cuda.o
    cuda.commands = $$CUDA_DIR/bin/nvcc $$NVCC_OPTIONS $$CUDA_INC $$NVCC_LIBS --machine $$SYSTEM_TYPE -arch=$$CUDA_ARCH -c -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
    cuda.dependency_type = TYPE_C
    QMAKE_EXTRA_COMPILERS += cuda
}

And the DP_GPU.cu file with a main function and some minor changes:

#include <cuda.h>
#include <cuda_runtime.h>
#include <device_launch_parameters.h>
#include <stdio.h> // <-- added for 'printf'


__global__ void zero_GPU(double *l_p_array_gpu)
{
    int i = blockIdx.x * blockDim.x + threadIdx.x; // <-- in case you use more blocks
    printf("  %i: Hello World!\n", i);
    l_p_array_gpu[i] = 0.;
}


void zero(double *l_p_array, int a_numElements)
{
    double *l_p_array_gpu;

    int size = a_numElements * int(sizeof(double));

    cudaMalloc((void**) &l_p_array_gpu, size);

    cudaMemcpy(l_p_array_gpu, l_p_array, size, cudaMemcpyHostToDevice);

    // use one block with a_numElements threads
    zero_GPU<<<1, a_numElements>>>(l_p_array_gpu);

    cudaMemcpy(l_p_array, l_p_array_gpu, size, cudaMemcpyDeviceToHost);

    cudaFree(l_p_array_gpu);
}

// added a main function to run the program
int main(void)
{
    // host variables
    const int a_numElements = 5;
    double l_p_array[a_numElements];

    // run cuda function
    zero(l_p_array, a_numElements);

    // Print l_p_array
    printf("l_p_array: { ");
    for (int i = 0; i < a_numElements; ++i)
    {
        printf("%.2f ", l_p_array[i]);
    }
    printf("}\n");

    return 0;
}

Output:

  0: Hello World!
  1: Hello World!
  2: Hello World!
  3: Hello World!
  4: Hello World!
l_p_array: { 0.00 0.00 0.00 0.00 0.00 }

Once you get this working make sure to take some time checking out basic CUDA syntax and examples before you get too far in. Otherwise debugging is going to be a real hassle. Since I'm here though I figured I'd also let you know the CUDA kernel syntax is
kernel_function<<<block_size, thread_size>>>(args) .
Your kernel call zero_GPU<<<size,1>>>(l_p_array_gpu) would actually create a bunch of blocks with a single thread when you actually want the opposite.

The following functions come from the CUDA samples and help determine how many threads and blocks you need for a given number of elements:

typedef unsigned int uint;

inline uint iDivUp(uint a, uint b)
{
    return (a % b != 0) ? (a / b + 1) : (a / b);
}

// compute grid and thread block size for a given number of elements
inline void computeGridSize(uint n, uint blockSize, uint &numBlocks, uint &numThreads)
{
    numThreads = min(blockSize, n);
    numBlocks = iDivUp(n, numThreads);
}

You can add them to the top of your .cu file or to a helper header file and use them to properly call kernel functions. If you wanted to use them in your DP_GPU.cu file you would just add:

// desired thread count (may change if there aren't enough elements)
dim3 threads(64);
// default block count (will also change based on number of elements)
dim3 blocks(1);
computeGridSize(a_numElements, threads.x, blocks.x, threads.x);

// run kernel
zero_GPU<<<blocks, threads>>>(l_p_array_gpu);

Anyway, got a little sidetracked but I hope this helps! Cheers!

CUDA, Qt creator and Mac

Question

1 answers

solution1
1 ACCPTED 2015-12-01 12:47:41

CUDA, Qt creator and Mac

Question

1 answers

solution1 1 ACCPTED 2015-12-01 12:47:41

solution1
1 ACCPTED 2015-12-01 12:47:41