简体   繁体   中英

The input to the CNN of Conv1D

I'm working in the field of machine learning.

For the stronger Network, I'm going to adopt the techniques concerning Conv1D.

The input data is an one-dimension list data so I just would've thought that Conv1D is the best choice.

What would happen if the input size is (1, 740) ? Would it be okay the input channel is 1?

I mean,I have a feeling that the (1, 740) tensor's conv1D output should be the same with that of a simple Linear networks.

Of course I'll also include other conv1d layer, like below.

self.conv1 = torch.nn.Conv1d(in_channels=1, out_channels=64, kernel_size=5)
self.conv2 = torch.nn.Conv1d(in_channels=64,out_channels=64, kernel_size=5)
self.conv3 = torch.nn.Conv1d(in_channels=64, out_channels=64, kernel_size=5)
self.conv4 = torch.nn.Conv1d(in_channels=64, out_channels=64, kernel_size=5)

Would it make sense when an input channel is 1?

Thanks in advance. :)

I think it's fine. Note that the input of Conv1D should be (B, N, M), where B is the batch size, N is the number of channels (eg for RGB is 3) and M is the number of features. The out_channels refers to the number of 5x5 filters to use. look at the output shape of the following code:

k = nn.Conv1d(1,64,kernel_size=5)
input = torch.randn(1, 1, 740)
print(k(input).shape) # -> torch.Size([1, 64, 736])

The 736 is the result of not using padding the dimension isn't kept.

The nn.Conv1d layer takes an input of shape (b, c, w) (where b is the batch size, c the number of channels, and w the input width). Its kernel size is one-dimensional. It performs a convolution operation over the input dimension (batch and channel axes aside). This means the kernel will apply the same operation over the whole input (wether 1D, 2D, or 3D). Like a 'sliding window'. As such, it only has kernel_size parameters. This is the main characteristic of a convolution layer.

Conv1d allows to extract features on the input regardless of where it's located in the input data: at the beginning or at the end of your w -width input. This would make sense if your input is temporal (input sequence over time) or spatial data (an image).

On the other hand, a nn.Linear takes a 1D tensor as input and returns another 1D tensor. You could consider w to be the number of neurons. You would end up having w*output_dim parameters. If your input contains components which are independant from one another (like a One/Multi-Hot-Encoding) then a fully connected layer as nn.Linear implements would be prefered.

These two behave differently. When using a nn.Linear - in scenarios where you should use a nn.Conv1d - your training would ideally result in having neurons of equal weights, if that makes sense... but you probably won't. Fully-densely-connected layers were used in the past in deep learning for computer vision. Today convolutions are used because there are much more efficient and suitable for these types of tasks.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM