简体繁体 English

发出带有素数输入维度的培训CNN

[英]Issues Training CNN with Prime number input dimensions

原文 2017-10-25 17:23:14 3 1 python/ keras/ conv-neural-network/ autoencoder/ max-pooling

I am currently developing a CNN model with Keras (an autoencoder). 我目前正在使用Keras（自动编码器）开发CNN模型。 This type my inputs are of shape (47,47,3) , that is a 47x47 image with 3 (RGB) layers. 此类型的输入的形状为(47,47,3) ，即具有3（RGB）层的47x47图像。

I have worked with some CNN's in the past, but this time my input dimensions are prime numbers (47 pixels). 过去，我曾经使用过某些CNN，但是这次输入的尺寸是质数（47像素）。 This I think is causing issues with my implementation, specifically when using MaxPooling2D and UpSampling2D in my model. 我认为这会导致实现问题，特别是在模型中使用MaxPooling2D和UpSampling2D时。 I noticed that some dimensions are lost when max pooling and then up sampling . 我注意到在最大合并然后向上采样时会丢失某些维度 。

Using model.summary() I can see that after passing my (47,47,3) input through a Conv2D(24) and MaxPooling with a (2,2) kernel (that is 24 filters and half the shape) I get a output shape of (24, 24, 24) . 使用model.summary()我可以看到将(47,47,3)输入通过Conv2D(24)和带有(2,2)内核（即24个滤镜和一半形状）的MaxPooling后，得到了输出形状为(24, 24, 24) 。

Now, if I try to reverse that by UpSampling with a (2,2) kernel (double the shape) and convolving again I get a (48,48,3) shaped output. 现在，如果我尝试通过使用(2,2)内核（加倍形状）进行UpSampling来逆转这种情况，然后再次进行卷积， (48,48,3)得到(48,48,3)形状的输出。 That is one extra row and column than needed. 那比需要多了一行和一列。

To this I thought "no problem, just chose a kernel size that gives you the desired 47 pixels when up sampling" , but given that 47 is a prime number it seems to me that there is no kernel size that can do that. 为此，我认为“没问题，只需选择一个内核大小即可在向上采样时为您提供所需的47像素” ，但是鉴于47是质数，在我看来，没有内核大小可以做到这一点。

Is there any way to bypass this problem that does not involve changing my input dimensions to a non-prime? 有什么方法可以绕过不涉及将输入尺寸更改为非质数的问题？ Maybe I am missing something in my approach or maybe Keras has some feature I ignore that could help here. 也许我的方法中缺少某些东西，或者Keras具有一些我忽略的功能，可以在这里提供帮助。

1 个解决方案

I advice you to use ZeroPadding2D and Cropping2D . 我建议您使用ZeroPadding2D和Cropping2D 。 You can pad your image asymmetrically with 0 s and obtain an even size of your image without resizing it. 您可以使用0 s非对称地填充图像，并且无需调整大小即可获得均匀大小的图像。 This should solve the problem with upsampling. 这应该解决上采样的问题。 Moreover - remember about setting padding=same in all of your convolutional layers. 此外，请记住在所有卷积层中都设置padding=same 。

EDIT: 编辑：

Just to give you an example strategy on how to perform such operations: 为了给您提供有关如何执行此类操作的示例策略：

If before pooling the size of your network is odd - zero pad it to make it even. 如果在合并网络之前，网络的大小是奇数-零填充以使其均匀。
After corresponding upsample operation use cropping in order to bring back your feature map to original odd size. 进行相应的上采样操作后，请使用裁切功能将特征图恢复为原始的奇数大小。