简体繁体中英

Kernel Size for 3D Convolution

原文 2022-01-24 22:41:56 8 1 deep-learning/ neural-network/ pytorch/ conv-neural-network

The kernel size of 3D convolution is defined using depth, height and width in Pytorch or TensorFlow. For example, if we consider a CT/MRI image data with 300 slices, the input tensor can be (1,1,300,128,128), corresponding to (N,C,D,H,W). Then, the kernel size can be (3,3,3) for depth, height and width. When doing 3D convolution, the kernel is passed in 3 directions.

However, I was confused if we change the situation from CT/MRI to a colourful video. Let the video has 300 frames, then the input tensor will be (1,3,300,128,128) because of 3 channels for RGB images. I know that for a single RGB image, the kernel size can be 3X3X3 for channels, height and width. But when it comes to a video, it seems both Pytorch and Tensorflow still use depth, height and width to set the kernel size. My question is, if we still use a kernel of (3,3,3), is there a potential fourth dimension for the colour channels?

1 answers

Yes.

Actually the convolution operation occurring in a CNN is one dimension higher than its namesake. The channel dimension is always spanned by the entire kernel though, so there's no sliding along the channel dimension. For example, a 2D convolution layer with kernel size set to 5x5 applied to a 3 channel input is actually using a kernel of shape 3x5x5 (assuming channel first notation). Each output channel is the result of convolving the input with a different 3x5x5 kernel, so there is one of these 3x5x5 kernels for each output channel.

This is the same for videos. A 3D convolution layer is actually performing a 4D convolution in the same way. So an input of shape 1x3x300x128x128 with kernel size set to 3x3x3 will actually be performing 4D convolutions with kernels of shape 3x3x3x3.

Should Kernel size be same as word size in 1D Convolution?

why I can't set kernel size in 1d convolution?

Non-squared convolution kernel size

Understanding 2D convolution output size

cuda tiled 3d convolution implementations with shared memory

Deepmind Deep Q Network (DQN) 3D Convolution

I need some help about separated 3D convolution in tensorflow

Why is CNN convolution output size in PyTorch DQN tutorial computed with `kernel_size -1`?

2D convolution along three orthogonals (axis) for 3D volumetric image

High cpu utilization while using 3D convolution on gpu in theano 0.9

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Should Kernel size be same as word size in 1D Convolution? why I can't set kernel size in 1d convolution? Non-squared convolution kernel size Understanding 2D convolution output size cuda tiled 3d convolution implementations with shared memory Deepmind Deep Q Network (DQN) 3D Convolution I need some help about separated 3D convolution in tensorflow Why is CNN convolution output size in PyTorch DQN tutorial computed with `kernel_size -1`? 2D convolution along three orthogonals (axis) for 3D volumetric image High cpu utilization while using 3D convolution on gpu in theano 0.9

Related Tags

Kernel Size for 3D Convolution

Question

1 answers

solution1 0 2022-01-25 00:59:30

solution1
0 2022-01-25 00:59:30