简体繁体 English

神经网络预处理过程中的图像大小调整方法

[英]Image resizing method during preprocessing for neural network

原文 2016-12-12 13:52:00 3 1 image/ machine-learning/ neural-network/ classification/ conv-neural-network

I am new to machine learning.我是机器学习的新手。 I am trying to create an input matrix (X) from a set of images (Stanford dog set of 120 breeds) to train a convolutional neural network.我正在尝试从一组图像（120 个品种的斯坦福犬组）创建一个输入矩阵 (X) 来训练卷积神经网络。 I aim to resize images and turn each image into one row by making each pixel a separate column.我的目标是通过使每个像素成为单独的列来调整图像大小并将每个图像变成一行。

If I directly resize images to a fixed size, the images lose their originality due to squishing or stretching, which is not good (first solution).如果我直接将图像调整为固定大小，图像会因挤压或拉伸而失去其原创性，这并不好（第一种解决方案）。

I can resize by fixing either width or height and then crop it (all resultant images will be of the same size as 100x100), but critical parts of the image can be cropped (second solution).我可以通过固定宽度或高度来调整大小，然后裁剪它（所有结果图像的大小都与 100x100 相同），但可以裁剪图像的关键部分（第二种解决方案）。

I am thinking of another way of doing it, but I am sure.我正在考虑另一种方法，但我确定。 Assume I want 10000 columns per image.假设我想要每个图像 10000 列。 Instead of resizing images to 100x100, I will resize the image so that the total pixel count will be around 10000 pixels.我将调整图像大小，使总像素数约为 10000 像素，而不是将图像大小调整为 100x100。 So, images of size 50x200, 100x100 and 250x40 will all converted into 10000 columns.因此，大小为 50x200、100x100 和 250x40 的图像都将转换为 10000 列。 For other sizes like 52x198, the first 10000 pixels out of 10296 will be considered (third solution).对于 52x198 等其他尺寸，将考虑 10296 个像素中的前 10000 个像素（第三种解决方案）。

The third solution I mentioned above seems to preserve the original shape of the image.我上面提到的第三个解决方案似乎保留了图像的原始形状。 However, it may be losing all of this originality while converting into a row since not all images are of the same size.但是，由于并非所有图像的大小都相同，因此在转换为一行时可能会失去所有这些独创性。 I wonder about your comments on this issue.我想知道你对这个问题的评论。 It will also be great if you can direct me to sources I can learn about the topic.如果您能将我指向我可以了解该主题的来源，那也将很棒。

1 个解决方案

Solution 1 (simply resizing the input image) is a common approach.解决方案 1（简单地调整输入图像的大小）是一种常见的方法。 Unless you have a very different aspect ratio from the expected input shape (or your target classes have tight geometric constraints), you can usually still get good performance.除非您的纵横比与预期的输入形状非常不同（或者您的目标类具有严格的几何约束），否则您通常仍然可以获得良好的性能。

As you mentioned, Solution 2 (cropping your image) has the drawback of potentially excluding a critical part of your image.正如您所提到的，解决方案 2（裁剪图像）的缺点是可能会排除图像的关键部分。 You can get around that by running the classification on multiple subwindows of the original image (ie, classify multiple 100 x 100 sub-images by stepping over the input image horizontally and/or vertically at an appropriate stride).您可以通过在原始图像的多个子窗口上运行分类来解决这个问题（即，通过以适当的步幅水平和/或垂直跨过输入图像对多个 100 x 100 子图像进行分类）。 Then, you need to decide how to combine your multiple classification results.然后，您需要决定如何组合您的多个分类结果。

Solution 3 will not work because the convolutional network needs to know the image dimensions (otherwise, it wouldn't know which pixels are horizontally and vertically adjacent).解决方案 3 将不起作用，因为卷积网络需要知道图像尺寸（否则，它不知道哪些像素水平和垂直相邻）。 So you need to pass an image with explicit dimensions (eg, 100 x 100) unless the network expects an array that was flattened from assumed dimensions.因此，您需要传递具有明确尺寸（例如，100 x 100）的图像，除非网络需要从假定尺寸展平的数组。 But if you simply pass an array of 10000 pixel values and the network doesn't know (or can't assume) whether the image was 100 x 100, 50 x 200, or 250 x 40, then the network can't apply the convolutional filters properly.但是，如果您只是传递一个包含 10000 个像素值的数组，而网络不知道（或无法假设）图像是 100 x 100、50 x 200 还是 250 x 40，则网络无法应用卷积滤波器正确。

Solution 1 is clearly the easiest to implement but you need to balance the likely effect of changing the image aspect ratios with the level of effort required for running and combining multiple classifications for each image.解决方案 1 显然是最容易实现的，但您需要平衡更改图像纵横比的可能影响与运行和组合每个图像的多个分类所需的工作量。