tf.extract_image_patches方法是什么意思？

Question

I want to divide my images into smaller windows which will be send to a neural net for training (eg for face detectors training). 我想将图像分成较小的窗口，然后将其发送到神经网络进行训练（例如，用于面部检测器训练）。 I found tf.extract_image_patches method in Tensorflow which seemed like exactly what I need. 我在tf.extract_image_patches中找到了tf.extract_image_patches方法，这似乎正是我所需要的。 This question explains what it does. 这个问题解释了它的作用。

The example there shows input of (1x10x10x1) (numbers 1 through 100 in order) given the ksize is (1, 3, 3, 1) (and strides (1, 5, 5, 1) ). 还有的例子显示的输入(1x10x10x1)编号1至100按顺序）给出的ksize是(1, 3, 3, 1)和strides (1, 5, 5, 1) The output is this: 输出是这样的：

 [[[[ 1  2  3 11 12 13 21 22 23]
    [ 6  7  8 16 17 18 26 27 28]]

   [[51 52 53 61 62 63 71 72 73]
    [56 57 58 66 67 68 76 77 78]]]]

But I'd expect windows like this (of a shape (Nx3x3x1) , so that it's N patches/windows of the size 3x3 ): 但我希望这样的窗口（形状为(Nx3x3x1) ，因此它是N补丁/大小为3x3窗口）：

[[[1, 2, 3]
  [11, 12, 13]
  [21, 22, 23]]
    ...

So why are all patch values stored in 1D? 那么，为什么所有面片值都存储在一维中？ Does it mean that this method is not meant for the purposes I described above and i can't use it to prepare batches for training? 这是否意味着该方法不适合我上面描述的目的，并且我不能使用它来准备用于培训的批次？ I also found another method for patches extracting, sklearn.feature_extraction.image.extract_patches_2d and this one really does what I was expecting. 我还找到了另一种提取补丁的方法sklearn.feature_extraction.image.extract_patches_2d ，它确实可以实现我所期望的。 So should I understand it like that these two methods don't do the same thing? 所以我应该像这两种方法做的一样吗？

Answer 1

Correct, these functions return different tensors (multi-dimensional arrays). 正确，这些函数返回不同的张量（多维数组）。

First, tf.extract_image_patches documentation reads: 首先， tf.extract_image_patches文档内容为：

Returns: 返回值：

A Tensor. 张量。 Has the same type as images. 与图像具有相同的类型。 4-D Tensor with shape [batch, out_rows, out_cols, ksize_rows * ksize_cols * depth] containing image patches with size ksize_rows x ksize_cols x depth vectorized in the "depth" dimension. 形状为[batch, out_rows, out_cols, ksize_rows * ksize_cols * depth] 4-D张量，其中包含大小为ksize_rows x ksize_cols x depth矢量图像块，这些图像块在“ depth”维度中被矢量化。 Note out_rows and out_cols are the dimensions of the output patches. 注意out_rows和out_cols是输出补丁的尺寸。

Basically, this says that [1, 2, 3] , [11, 12, 13] , [21, 22, 23] windows are flattened, or vectorized in the "depth" dimension . 基本上，这表示[1, 2, 3] ， [11, 12, 13] ， [21, 22, 23]窗口是按“深度”维度展平或矢量化的 。 The out_rows and out_cols are calculated from the strides argument, which in this case is strides=[1, 5, 5, 1] , and by padding , which is 'VALID' . 的out_rows和out_cols被从计算出的strides的参数，在这种情况下是strides=[1, 5, 5, 1]并且通过padding ，这是'VALID' 。 As a result, the output shape is (1, 2, 2, 9) . 结果，输出形状为(1, 2, 2, 9) 。

In other words: 换一种说法：

strides changes the spatial dimensions strides改变空间尺寸
ksizes changes the depth ksizes改变深度

Note that the output tensor does contain all individual windows, so you can access them through selection. 请注意，输出张量确实包含所有单独的窗口，因此您可以通过选择来访问它们。

On the other hand, sklearn.feature_extraction.image.extract_patches_2d : 另一方面， sklearn.feature_extraction.image.extract_patches_2d ：

Returns: 返回值：

patches : array, shape = (n_patches, patch_height, patch_width) or (n_patches, patch_height, patch_width, n_channels) The collection of patches extracted from the image, where n_patches is either max_patches or the total number of patches that can be extracted. patches ：array，shape = (n_patches, patch_height, patch_width)或(n_patches, patch_height, patch_width, n_channels)从图像提取的修补程序集合，其中n_patches是max_patches或可提取的修补程序总数。

This is exactly what you describe: each window takes the whole spatial dimensions patch_height, patch_width . 这正是您所描述的：每个窗口都具有整个空间尺寸patch_height, patch_width 。 Here, the result shape depends on the patch_size , striding and padding is not supported , and the first dimension is calculated as the total number of patches. 在此，结果形状取决于patch_size ， 不支持 stride和padding，并且将第一维计算为补丁总数。

tf.extract_image_patches方法是什么意思？

问题描述

1 个解决方案

解决方案1
0 2017-11-11 09:35:48

tf.extract_image_patches方法是什么意思？

问题描述

1 个解决方案

解决方案1 0 2017-11-11 09:35:48

解决方案1
0 2017-11-11 09:35:48