如何在PyCuda中使用tex2D？

Question

我是一名Python程序員，最近剛開始使用PyCuda，因為我需要為圖像處理編寫自定義過濾器。 我找到了tex2D ，對我來說處理填充和超出范圍的問題看起來非常優雅。 我的問題是我對如何將數據傳遞到cuda內核感到非常困惑。

現在，我已經做到了：

#!/usr/bin/env python3
"""minimal example: cuda kernel that returns the input using textures"""

import numpy as np
import pycuda.driver as cuda
from pycuda.compiler import SourceModule
import pycuda.autoinit
from pycuda.tools import dtype_to_ctype

# cuda kernel
mod = SourceModule("""
#include <pycuda-helpers.hpp>

texture<fp_tex_float, 2> my_tex;

__global__ void return_input(const int input_width, const int input_height, float *output)
{
    int row = blockIdx.x * blockDim.x + threadIdx.x;
    int col = blockIdx.y * blockDim.y + threadIdx.y;

    if(row < input_height && col < input_width)
    {
        int index = col * input_width + row;
        output[index] = tex2D(my_tex, row, col);
    }
}
""")

# get from cuda kernel
return_input = mod.get_function('return_input')
my_tex = mod.get_texref('my_tex')

# setup texture
shape = (5, 5)
img_cpu = np.random.rand(*shape).astype(np.float32)
print(img_cpu)
img_gpu = cuda.matrix_to_array(img_cpu, order='C', allow_double_hack=True)
my_tex.set_array(img_gpu)

# setup output
out_cpu = np.zeros((shape), dtype=np.float32)
out_gpu = cuda.to_device(out_cpu)

# build grid
blocksize = 32
img_height, img_width = np.shape(img_cpu)
grid = (int(np.ceil(img_height / blocksize)),
        int(np.ceil(img_width / blocksize)),
        1)

# call cuda kernel
return_input(img_width,
             img_height,
             out_gpu,
             block=(blocksize, blocksize, 1),
             grid=grid)

# copy back to host
cuda.memcpy_dtoh(out_gpu, out_cpu)
print(out_cpu)

Answer 1

對於每個在同一問題上絆腳石的人，我的解決方案是：

CUDA文件名為minimal_kernel.cu ：

#include <pycuda-helpers.hpp>

texture<float, 2> my_tex;

__global__ void return_input(const int input_width, const int input_height, float *output)
{
    int row = blockIdx.x * blockDim.x + threadIdx.x;
    int col = blockIdx.y * blockDim.y + threadIdx.y;

    if(row < input_height && col < input_width)
    {
        int index = col * input_width + row;
        output[index] = tex2D(my_tex, row, col);
    }
}

Python檔案：

#!/usr/bin/env python3
"""minimal example: cuda kernel that returns the input using textures"""

import numpy as np
import pycuda.driver as cuda
from pycuda.compiler import SourceModule
import pycuda.autoinit

# get from cuda kernel
with open('./minimal_kernel.cu', 'r') as f:
    mod = SourceModule(f.read())
return_input = mod.get_function('return_input')
my_tex = mod.get_texref('my_tex')

# setup texture
shape = (5, 5)
img_in = np.random.rand(*shape).astype(np.float32)
print(img_in)
cuda.matrix_to_texref(img_in, my_tex, order='C')

# setup output
img_out = np.zeros(shape, dtype=np.float32)

# build grid
blocksize = 32
img_height, img_width = np.int32(np.shape(img_in))
grid = (int(np.ceil(img_height / blocksize)),
        int(np.ceil(img_width / blocksize)),
        1)

# call cuda kernel
return_input(img_width,
             img_height,
             cuda.Out(img_out),
             texrefs=[my_tex],
             block=(blocksize, blocksize, 1),
             grid=grid)

print(img_out)

如何在PyCuda中使用tex2D？

問題描述

1 個解決方案

解決方案1
1 已采納 2019-08-29 15:48:25

如何在PyCuda中使用tex2D？

問題描述

1 個解決方案

解決方案1 1 已采納 2019-08-29 15:48:25

解決方案1
1 已采納 2019-08-29 15:48:25