If the separate compilation units that are fed as input to nvlink
contain cuda kernels and device functions that invoke device functions marked as __forceinline__
, will these functions be inlined? Assume they would be inlined if one put all the source code into a single file.
If the separate compilation units that are fed as input to nvlink contain cuda kernels and device functions that invoke device functions marked as
__forceinline__
, will these functions be inlined?
To the best of my knowledge, the CUDA device code linker can't do this. The __forceinline__
directive is a compiler level operation, and after compilation there is no way of marking code as inlineable in either PTX or SASS. The CUDA device code compiler should emit a warning that an external inline function was used but not defined if you try this.
If you want functions to be compiled inline, you have to (unsurprisingly) use a compiler, not a linker.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.