简体   繁体   中英

compile cuda code with relocatable device code through python distutils (for python c extension)

I have some cuda code that uses cooperative groups, and thus requires the -rdc=true flag to compile with nvcc. I would like to call the cuda code from python, so am writing a python interface with python c extensions.

Because I'm including cuda code I had to adapt my setup.py, as described in: Can python distutils compile CUDA code?

This compiles and installs, but as soon as I import my code in python, it segfaults. Removing the -rdc=true flag makes everything work, but forces me to remove any cooperative group code from the cuda kernels (or get a 'cudaCGGetIntrinsicHandle unresolved' error during compilation).

Any way I can adapt my setup.py further to get this to work? Alternatively, is there an other way to compile my c extension that allows cuda code (with the rdc flag on)?

Think I sort of figured out the answer. If you generate relocatable device code with nvcc, either nvcc needs to link the object files so device code linking gets handled correctly, or you need to generate a separate object file by running nvcc on all the object files that have relocatable device code with the '--device-link' flag. This extra object file can then be included with all the other object files for an external linker.

I adapted the setup from Can python distutils compile CUDA code? by adding a dummy 'link.cu' file to the end of the sources file list. I also add the cudadevrt library and another set of compiler options for the cuda device linking step:

ext = Extension('mypythonextension',
                sources=['python_wrapper.cpp', 'file_with_cuda_code.cu', 'link.cu'],
                library_dirs=[CUDA['lib64']],
                libraries=['cudart', 'cudadevrt'],
                runtime_library_dirs=[CUDA['lib64']],

                extra_compile_args={'gcc': [],
                                    'nvcc': ['-arch=sm_70', '-rdc=true', '--compiler-options', "'-fPIC'"],
                                    'nvcclink': ['-arch=sm_70', '--device-link', '--compiler-options', "'-fPIC'"]
                                    },
                include_dirs = [numpy_include, CUDA['include'], 'src'])

This then gets picked up in the following way by the function that adapts the compiler calls:

def customize_compiler_for_nvcc(self):
    self.src_extensions.append('.cu')

    # track all the object files generated with cuda device code
    self.cuda_object_files = []

    super = self._compile

    def _compile(obj, src, ext, cc_args, extra_postargs, pp_opts):
        # generate a special object file that will contain linked in
        # relocatable device code
        if src == 'link.cu':
            self.set_executable('compiler_so', CUDA['nvcc'])
            postargs = extra_postargs['nvcclink']
            cc_args = self.cuda_object_files[1:]
            src = self.cuda_object_files[0]
        elif os.path.splitext(src)[1] == '.cu':
            self.set_executable('compiler_so', CUDA['nvcc'])
            postargs = extra_postargs['nvcc']
            self.cuda_object_files.append(obj)
        else:
            postargs = extra_postargs['gcc']
        super(obj, src, ext, cc_args, postargs, pp_opts)
        self.compiler_so = default_compiler_so

    self._compile = _compile

The solution feels a bit hackish because of my lack of distutils knowledge, but it seems to work. :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM