简体   繁体   中英

Image deconvolution with a CNN

I have an input tensor of shape (C,H,W) , where H=W and C=W^2 . This tensor contains non-linearly transformed information for an image of shape (1,H,W) squeezed to (H,W) . The exact form of the transformation is not important (plus, there is no closed-form expression for it anyway). I would like to design a CNN to estimate images from such tensors. I realize that I will have to experiment with CNN architectures (since I don't have the exact form of the transformation), but I'm not exactly sure how to proceed.

The input tensor has both positive and negative values which are important for the image reconstruction, so a ReLU layer probably should not be implemented near the beginning of CNN . I don't think that pooling layers would be useful, either, at least in the H and W dimensions. Clearly, I have to collapse the C dimension to get the image, but I don't think that it should be done all at once, eg, torch.nn.Conv2d( C, 1, kernel_size ) is probably not a good idea.

It seems to me that I should first use a Conv2D layer which produces the same size tensor as the input tensor (to partially unscramble the non-linear transformation), but if the kernel size is greater than one the H and W dimensions will be reduced in size, which I don't want (unless this can be addressed later in the CNN ). On the other hand, if the kernel size is one the shape will stay the same but I don't think that anything happens to the tensor in this case. Also, I will probably have to include linear layers, but I'm not sure how to use them with 3D tensors.

Any suggestions would be welcome.

There's no problem with applying a ReLU layer near the beginning, as long as you apply a weighted linear layer first. If the net learns that it needs the values there, it can apply a negative weight to preserve the information (roughly speaking).

In fact, a useful thing to do in some networks is to normalize the input to fit a N(0, 1) normal distribution. See https://www.researchgate.net/post/Which_data_normalization_method_should_be_used_in_this_artificial_neural_network

As to the problem of "reducing" the H/W dimensions because of kernel sizes - you can probably use 0-padding on the borders to avoid this problem. In my experience the networks usually handle this relatively well. However, if performance is an issue, usually you might want to reduce resolution significantly and then do upscaling of some sort at the end. You can find an example of such network here: Create image of Neural Network structure

As for pooling/feature layers: Because the depth of the tensor is very big (W^2) I would suggest that you in fact do reduce a lot of it right away. The complexity of your network is quadratic in the depth of your tensors and in your pixels count, because of weights from/into each layer in the tensor. So, my basic strategy would be to reduce the information space fast in the beginning, do some layers of calculations, and then upscaling.

What I've learned over the years is that CNNs are pretty resilient, and that architectural ideas that might seem good on paper do very little in reality - the best factors are pretty much always more layers (done in a good way, but since ResNet it's gotten way easier) and more/better data. So I would start experimenting and try to assess given a working PoC what blocks the network or try variations.

I hope this makes enough sense :) Good luck!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM