简体   繁体   English

通过 python distutils 使用可重定位设备代码编译 cuda 代码(用于 python c 扩展)

[英]compile cuda code with relocatable device code through python distutils (for python c extension)

I have some cuda code that uses cooperative groups, and thus requires the -rdc=true flag to compile with nvcc.我有一些使用协作组的 cuda 代码,因此需要 -rdc=true 标志才能使用 nvcc 进行编译。 I would like to call the cuda code from python, so am writing a python interface with python c extensions.我想从 python 调用 cuda 代码,所以我正在编写一个带有 python c 扩展的 python 接口。

Because I'm including cuda code I had to adapt my setup.py, as described in: Can python distutils compile CUDA code?因为我包含了 cuda 代码,所以我不得不修改我的 setup.py,如: Can python distutils compile CUDA code?

This compiles and installs, but as soon as I import my code in python, it segfaults.这会编译并安装,但是一旦我在 python 中导入我的代码,它就会出现段错误。 Removing the -rdc=true flag makes everything work, but forces me to remove any cooperative group code from the cuda kernels (or get a 'cudaCGGetIntrinsicHandle unresolved' error during compilation).删除 -rdc=true 标志使一切正常,但迫使我从 cuda 内核中删除任何协作组代码(或在编译期间出现“cudaCGGetIntrinsicHandle unresolved”错误)。

Any way I can adapt my setup.py further to get this to work?我可以通过什么方式进一步调整我的 setup.py 以使其正常工作? Alternatively, is there an other way to compile my c extension that allows cuda code (with the rdc flag on)?或者,是否有其他方法可以编译我的 c 扩展,允许 cuda 代码(打开 rdc 标志)?

Think I sort of figured out the answer.想我有点想通了答案。 If you generate relocatable device code with nvcc, either nvcc needs to link the object files so device code linking gets handled correctly, or you need to generate a separate object file by running nvcc on all the object files that have relocatable device code with the '--device-link' flag.如果您使用 nvcc 生成可重定位设备代码,则 nvcc 需要链接目标文件以便正确处理设备代码链接,或者您需要通过在所有具有可重定位设备代码的目标文件上运行 nvcc 来生成单独的目标文件 ' --device-link' 标志。 This extra object file can then be included with all the other object files for an external linker.然后可以将这个额外的目标文件与外部链接器的所有其他目标文件一起包含在内。

I adapted the setup from Can python distutils compile CUDA code?我从Can python distutils compile CUDA code? 中调整了设置 by adding a dummy 'link.cu' file to the end of the sources file list.通过在源文件列表的末尾添加一个虚拟的“link.cu”文件。 I also add the cudadevrt library and another set of compiler options for the cuda device linking step:我还为 cuda 设备链接步骤添加了 cudadevrt 库和另一组编译器选项:

ext = Extension('mypythonextension',
                sources=['python_wrapper.cpp', 'file_with_cuda_code.cu', 'link.cu'],
                library_dirs=[CUDA['lib64']],
                libraries=['cudart', 'cudadevrt'],
                runtime_library_dirs=[CUDA['lib64']],

                extra_compile_args={'gcc': [],
                                    'nvcc': ['-arch=sm_70', '-rdc=true', '--compiler-options', "'-fPIC'"],
                                    'nvcclink': ['-arch=sm_70', '--device-link', '--compiler-options', "'-fPIC'"]
                                    },
                include_dirs = [numpy_include, CUDA['include'], 'src'])

This then gets picked up in the following way by the function that adapts the compiler calls:然后通过适应编译器调用的函数以以下方式获取它:

def customize_compiler_for_nvcc(self):
    self.src_extensions.append('.cu')

    # track all the object files generated with cuda device code
    self.cuda_object_files = []

    super = self._compile

    def _compile(obj, src, ext, cc_args, extra_postargs, pp_opts):
        # generate a special object file that will contain linked in
        # relocatable device code
        if src == 'link.cu':
            self.set_executable('compiler_so', CUDA['nvcc'])
            postargs = extra_postargs['nvcclink']
            cc_args = self.cuda_object_files[1:]
            src = self.cuda_object_files[0]
        elif os.path.splitext(src)[1] == '.cu':
            self.set_executable('compiler_so', CUDA['nvcc'])
            postargs = extra_postargs['nvcc']
            self.cuda_object_files.append(obj)
        else:
            postargs = extra_postargs['gcc']
        super(obj, src, ext, cc_args, postargs, pp_opts)
        self.compiler_so = default_compiler_so

    self._compile = _compile

The solution feels a bit hackish because of my lack of distutils knowledge, but it seems to work.由于我缺乏 distutils 知识,该解决方案感觉有点黑客,但它似乎有效。 :) :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM