简体   繁体   English

如何在CUDA中将不同种类的纹理绑定到纹理参考?

[英]How to bind different kinds of textures to a texture reference in CUDA?

This piece of code works on Cuda 4.2 这段代码适用于Cuda 4.2

extern "C" texture<int,1,cudaReadModeElementType> __tex0;
extern "C" __global__ void kernel(){
  float4 f = tex1Dfetch(*(texture<float4,1,cudaReadModeElementType>*)&__tex0,ii_z)
}

Since Cuda have changed grammer, I can not fetch different kind of textures from a texture, Any idea? 由于Cuda更改了语法,我无法从纹理中获取不同类型的纹理,知道吗?

PS . PS I've found Cuda texture object in reference, but That's a lot of work to change all occurances. 我已经在参考中找到了Cuda 纹理对象 ,但是要更改所有出现次数需要进行大量工作。 Is there a better solution with minor code change? 是否有较小的代码更改更好的解决方案?

Thanks 谢谢

If anyone want the original code, please click here . 如果有人想要原始代码,请单击此处

It seems like the minimum repro case for this is: 似乎最小的repro情况是:

texture<int,1,cudaReadModeElementType> __tex0;

__global__ void kernel0(float4 *out)
{
    int t__a = blockIdx.x*blockDim.x+threadIdx.x;
    int ii = (t__a*3);
    float4 rr = tex1Dfetch(*(texture<float4,1,cudaReadModeElementType>*)&__tex0,ii);
    out[t__a] = rr;
}

CUDA 7.5 will fail to compile this kernel with an error: CUDA 7.5将无法编译此内核,并显示以下错误:

texture_repo.cu(7): error: cannot take address of texture/surface variable "__tex0" in __device__/__global__ functions texture_repo.cu(7):错误:无法使用__device__/__global__函数中的纹理/表面变量"__tex0"地址

I believe this is correct. 我相信这是正确的。 Texture references are opaque placeholder types which don't have any of the usual properties of POD types and I would be very suspicious about ever writing code like the example you provided a link to. 纹理引用是不透明的占位符类型,不具有POD类型的任何常规属性,对于是否编写代码(例如您提供的链接示例),我将非常怀疑。

However, it is true that CUDA 4.2 will compile this and emit valid PTX: 但是,确实CUDA 4.2会对此进行编译并发出有效的PTX:

.entry _Z7kernel0P6float4(
        .param .u64 _Z7kernel0P6float4_param_0
)
{
        .reg .f32       %f<25>;
        .reg .s32       %r<8>;
        .reg .s64       %rl<5>;


        ld.param.u64    %rl1, [_Z7kernel0P6float4_param_0];
        cvta.to.global.u64      %rl2, %rl1;
        .loc 2 5 1
        mov.u32         %r2, %ntid.x;
        mov.u32         %r3, %ctaid.x;
        mov.u32         %r4, %tid.x;
        mad.lo.s32      %r5, %r2, %r3, %r4;
        .loc 2 6 1
        mul.lo.s32      %r1, %r5, 3;
        mov.u32         %r6, 0;
        // inline asm
        tex.1d.v4.f32.s32 {%f1, %f2, %f3, %f4}, [__tex0, {%r1}];
        // inline asm
        .loc 2 8 1
        mul.wide.s32    %rl3, %r5, 16;
        add.s64         %rl4, %rl2, %rl3;
        st.global.v4.f32        [%rl4], {%f1, %f2, %f3, %f4};
        .loc 2 9 2
        ret;
}

The cast apparently has no effect other than suppressing a compiler error, and at a PTX level the read works because texture reference reads always return a four wide vector type, even if the extra vector elements are empty and ignored. 显然,强制转换除了抑制编译器错误外没有其他作用,并且在PTX级别上读取是有效的,因为纹理引用读取始终返回四宽矢量类型,即使多余的矢量元素为空且被忽略。 I would regard the fact this compiles in CUDA 4.2 as a compiler bug, and it would seem that CUDA 7.5 is correct in this case. 我认为在CUDA 4.2中进行编译是一个编译器错误,在这种情况下CUDA 7.5似乎是正确的。

That said, a very hacky work-around would be to do this: 也就是说,一个非常棘手的解决方法是:

texture<int,1,cudaReadModeElementType> __tex0;

__device__ float4 tex_load0(int idx)
{
    float4 temp;
    asm("tex.1d.v4.f32.s32 {%0, %1, %2, %3}, [__tex0, {%4}];" :
        "=f"(temp.x), "=f"(temp.y), "=f"(temp.z), "=f"(temp.w) : "r"(idx));
    return temp;
}

__global__ void kernel1(float4 *out)
{
    int t__a = blockIdx.x*blockDim.x+threadIdx.x;
    int ii = (t__a*3);
    float4 rr = tex_load0(ii); 
    out[t__a] = rr;
}

[DISCLAIMER: compiled but never tested. [免责声明:已编译,但从未经过测试。 Not recommended. 不建议。 Use at own risk]. 使用风险自负]。

ie insert the same PTX emitted inline by the CUDA 4.2 compiler into a device function, and replace the texture fetches with calls to the device function. 例如,将CUDA 4.2编译器内联发出的相同PTX插入到设备函数中,并用对设备函数的调用替换纹理提取。 With the CUDA 7.5 toolchain, this emits: 使用CUDA 7.5工具链,它发出:

//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-19856038
// Cuda compilation tools, release 7.5, V7.5.17
// Based on LLVM 3.4svn
//

.version 4.3
.target sm_30
.address_size 64

    // .globl   _Z9tex_load0i
.global .texref __tex0;

.visible .func  (.param .align 16 .b8 func_retval0[16]) _Z9tex_load0i(
    .param .b32 _Z9tex_load0i_param_0
)
{
    .reg .f32   %f<5>;
    .reg .b32   %r<2>;


    ld.param.u32    %r1, [_Z9tex_load0i_param_0];
    // inline asm
    tex.1d.v4.f32.s32 {%f1, %f2, %f3, %f4}, [__tex0, {%r1}];
    // inline asm
    st.param.f32    [func_retval0+0], %f1;
    st.param.f32    [func_retval0+4], %f2;
    st.param.f32    [func_retval0+8], %f3;
    st.param.f32    [func_retval0+12], %f4;
    ret;
}

    // .globl   _Z7kernel1P6float4
.visible .entry _Z7kernel1P6float4(
    .param .u64 _Z7kernel1P6float4_param_0
)
{
    .reg .f32   %f<5>;
    .reg .b32   %r<6>;
    .reg .b64   %rd<5>;


    ld.param.u64    %rd1, [_Z7kernel1P6float4_param_0];
    cvta.to.global.u64  %rd2, %rd1;
    mov.u32     %r2, %ctaid.x;
    mov.u32     %r3, %ntid.x;
    mov.u32     %r4, %tid.x;
    mad.lo.s32  %r5, %r3, %r2, %r4;
    mul.lo.s32  %r1, %r5, 3;
    mul.wide.s32    %rd3, %r5, 16;
    add.s64     %rd4, %rd2, %rd3;
    // inline asm
    tex.1d.v4.f32.s32 {%f1, %f2, %f3, %f4}, [__tex0, {%r1}];
    // inline asm
    st.global.v4.f32    [%rd4], {%f1, %f2, %f3, %f4};
    ret;
}

which is the same PTX as the CUDA 4.2 toolchain emitted. 与发布的CUDA 4.2工具链相同的PTX。 This works because the compiler can't apply nearly the same level of type safety checking to inline PTX. 之所以可行,是因为编译器无法对嵌入式PTX应用几乎相同级别的类型安全检查。 But think hard about whether you really want to do this, because it is (in my opinion) undefined behaviour. 但是,请认真考虑您是否真的想这样做,因为(在我看来)这是不确定的行为。

Also note that because of the way texture references are handled in PTX, you can't pass them as explicit arguments, so you will require defining one read function per texture in your code. 还要注意,由于在PTX中处理纹理引用的方式,您不能将它们作为显式参数传递,因此您将需要在代码中为每个纹理定义一个读取函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM