将库的主机端和 CUDA 设备端版本分开

Question

I have a library with some __host__ __device__ functions.我有一个带有__host__ __device__函数的库。 I also have an #ifdef __CUDACC__ gadget which makes sure that a regular C++ compiler doesn't see the __host__ __device__ and can thus compile those functions.我还有一个#ifdef __CUDACC__小工具，它确保常规 C++ 编译器看不到__host__ __device__并因此可以编译这些函数。

Now, I want to use the compiled host-side version of my library's function in a plain-vanilla C++ static library file ( .a on Linux) - and I would even like that library to be compilable when CUDA is unavailable;现在，我想在一个普通的 C++ 静态库文件（Linux 上的.a ）中使用我的库函数的编译主机端版本 - 我什至希望在 CUDA 不可用时可以编译该库； and I want the compiled device-side versions in a separate static library.我希望在单独的静态库中编译设备端版本。

I am almost there (I think), but am stuck with a linking error.我快到了（我想），但是遇到了链接错误。 Here are toy sources for such a library, a test program (which calls both the device-side and the host-side version of a function) and the build commands I use.这里是这样一个库的玩具源、一个测试程序（它调用一个函数的设备端和主机端版本）和我使用的构建命令。

What am I getting wrong?我怎么了？

my_lib.hpp (Library header): my_lib.hpp （库头文件）：

#ifdef __CUDACC__
__host__ __device__
#endif
void foo(int*x, int* y);
int bar();

my_lib.cu (Library source): my_lib.cu （库源）：

#include "my_lib.hpp"

#ifdef __CUDACC__
__host__ __device__
#endif
void foo(int*x, int* y)  { *x = *y; }

int bar() { return 5; }

main.cu (test program): main.cu （测试程序）：

#include "my_lib.hpp"

__global__ void my_kernel() {
  int z { 78 };
  int w { 90 };
  foo(&z,&w);
}

int main() {
  int z { 123 };
  int w { 456 };
  foo(&z,&w);
  my_kernel<<<1,1>>>();
  cudaDeviceSynchronize();
  cudaDeviceReset();
}

My build commands:我的构建命令：

c++ -c -x c++ -o my_lib-noncuda.o my_lib.cu
ar qc my_lib-noncuda.a my_lib-noncuda.o
ranlib my_lib-noncuda.a
nvcc -dc -o my_lib-cuda.o my_lib.cu
ar qc my_lib-cuda.a my_lib-cuda.o
ranlib my_lib-cuda.a
nvcc -dc -o main.rdc.o main.cu
nvcc -dlink -o main.o main.rdc.o my_lib-cuda.a
c++ -o main main.o my_lib-noncuda.a -lcudart

And the errors I get - on the last, linking, command:我得到的错误 - 在最后一个链接命令上：

/usr/bin/ld: main.o: in function `__cudaRegisterLinkedBinary_39_tmpxft_00003f88_00000000_6_main_cpp1_ii_e7ab3416':
link.stub:(.text+0x5a): undefined reference to `__fatbinwrap_39_tmpxft_00003f88_00000000_6_main_cpp1_ii_e7ab3416'
/usr/bin/ld: main.o: in function `__cudaRegisterLinkedBinary_41_tmpxft_00003f69_00000000_6_my_lib_cpp1_ii_ab44b3f6':
link.stub:(.text+0xaa): undefined reference to `__fatbinwrap_41_tmpxft_00003f69_00000000_6_my_lib_cpp1_ii_ab44b3f6'
collect2: error: ld returned 1 exit status

Notes:笔记：

I use CUDA 10.1 and g++ 9.2.1 on Devuan GNU/Linux.我在 Devuan GNU/Linux 上使用 CUDA 10.1 和 g++ 9.2.1。
This is a "follow-up" to a deleted question;这是对已删除问题的“跟进”； @talonmies commented I had better show exactly what I did; @talonmies 评论说我最好准确地展示我所做的； and that changed the question somewhat.这在某种程度上改变了问题。
Somewhat-related question: this one .有点相关的问题：这个。

Answer 1

Let us modify your example into what I think your actual usage case would be.让我们将您的示例修改为我认为您的实际使用案例。 The modification places main() into a .cpp file, to be compiled by g++ , and the CUDA code into a separate .cu file, to be compiled by nvcc .修改将main()放入一个.cpp文件中，由g++编译，并将 CUDA 代码放入一个单独的.cu文件中，由nvcc编译。 This is important to making your two-library setup work;这对于使您的两个库设置工作很重要； and justifiable, because the "main contains CUDA kernels requiring separate compilation and linkage" is a peculiar corner case for the nvcc compilation model.并且是有道理的，因为“主要包含需要单独编译和链接的 CUDA 内核”是nvcc编译模型的特殊情况。

The restructured code:重组后的代码：

main.cu : main.cu :

include "my_lib.hpp"

__global__ void my_kernel() {
  int z { 78 };
  int w { 90 };
  foo(&z,&w);
}

int cudamain()
{
  my_kernel<<<1,1>>>();
  return 0;
}

main.cpp : main.cpp :

#include <cuda_runtime_api.h>
#include "my_lib.hpp"

extern int cudamain();

int main() {
  int z { 123 };
  int w { 456 };
  foo(&z,&w);
  cudamain();
  cudaDeviceSynchronize();
  cudaDeviceReset();
}

all other files remain as in the question.所有其他文件保持在问题中。

The commands required to build the program are now:构建程序所需的命令现在是：

c++ -c -x c++ -o my_lib-noncuda.o my_lib.cu
ar qc my_lib-noncuda.a my_lib-noncuda.o
ranlib my_lib-noncuda.a

nvcc -std=c++11 -dc -o my_lib-cuda.rdc.o my_lib.cu
ar qc my_lib-cuda.a my_lib-cuda.rdc.o
ranlib my_lib-cuda.a

# Until this line - identical to what you have tried in your question

nvcc -std=c++11 -c -rdc=true main.cu -o main.cu.o 
nvcc -dlink -o main.o main.cu.o my_lib-cuda.a

c++ -std=c++11 -o main main.cpp main.o main.cu.o -I/path/to/cuda/include \
    -L/path/to/cuda/lib64 my_lib-cuda.a my_lib-noncuda.a -lcudart -lcudadevrt

The important thing to keep in mind there are host side components which need to be carried forward in the build.要记住的重要一点是，主机端组件需要在构建中进行。 Thus you must pass the nvcc output of the CUDA host code to the main linkage, and you must also add your CUDA side library to the main linkage.因此，您必须将 CUDA 主机代码的nvcc输出传递给主链接，并且还必须将您的 CUDA 侧库添加到主链接。 Otherwise the host-side runtime API support for your code will be missing.否则将缺少对代码的主机端运行时 API 支持。 Note also you must link the device runtime library to make this work.另请注意，您必须链接设备运行时库才能使其正常工作。

Answer 2

Here is how you could create two libraries, one containing only CUDA-device functions and the other containing only host functions.以下是创建两个库的方法，一个仅包含 CUDA 设备功能，另一个仅包含主机功能。 You could omit the "complicated" #if and the #ifndef guard.您可以省略“复杂的” #if和#ifndef保护。 But then you would have also the "non-CUDA-code" in your library my_lib-cuda.a .但是，您的库my_lib-cuda.a也会有“非 CUDA 代码”。

For the other issues see @talonmies community wiki answer or refer to the link I already posted in the comments: https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/ - Section "Advanced Usage: Using a Different Linker".对于其他问题，请参阅 @talonmies 社区 wiki 答案或参考我已在评论中发布的链接： https ://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/ - “高级用法”部分：使用不同的链接器”。

my_lib.cu my_lib.cu

#include "my_lib.hpp"

#ifdef __CUDA_ARCH__
__device__
#endif
#if (defined __CUDA_ARCH__) || (not defined __CUDACC__)
void foo(int*x, int* y)  { *x = *y; }
#endif

#ifndef __CUDACC__
int bar() { return 5; }
#endif

The build process of the libraries stays the same: (only changed ar qc to ar rc to replace existing files so you don't get an error when rebuilding without deleting the library beforehand)库的构建过程保持不变：（仅将ar qc更改为ar rc以替换现有文件，因此在不事先删除库的情况下重建时不会出现错误）

c++ -c -x c++ -o my_lib-noncuda.o my_lib.cu
ar rc my_lib-noncuda.a my_lib-noncuda.o
ranlib my_lib-noncuda.a
nvcc -dc -o my_lib-cuda.o my_lib.cu
ar rc my_lib-cuda.a my_lib-cuda.o 
ranlib my_lib-cuda.a

Building a CUDA program: (simplified by using only nvcc and not c++ , alternatively have a look at @talonmies community wiki answer)构建 CUDA 程序：（通过仅使用nvcc而不是c++简化，或者查看@talonmies 社区 wiki 答案）

nvcc -dc main.cu -o main.o
nvcc main.o my_lib-cuda.a my_lib-noncuda.a -o main

The link to my_lib-noncuda.a can be omitted if you also omit the #if and #ifndef in my_lib.cu as described above.如果您还如上所述省略my_lib.cu的#if和#ifndef ，则可以省略指向my_lib-noncuda.a的链接。

Building a C++ program: (given that there are #ifdef __CUDACC__ guards around the CUDA code in main.cu )构建一个C ++程序：（假设有#ifdef __CUDACC__围绕在CUDA代码卫士main.cu ）

c++ -x c++ -c main.cu -o main.o
c++ main.o my_lib-noncuda.a -o main

将库的主机端和 CUDA 设备端版本分开

问题描述

2 个解决方案

解决方案1
1

解决方案2
1 2019-12-17 15:01:45

将库的主机端和 CUDA 设备端版本分开

问题描述

2 个解决方案

解决方案1 1

解决方案2 1 2019-12-17 15:01:45

解决方案1
1

解决方案2
1 2019-12-17 15:01:45