简体   繁体   English

了解用于从图像中提取补丁的 tf.extract_image_patches

[英]Understanding tf.extract_image_patches for extracting patches from an image

I found the following method tf.extract_image_patches in tensorflow API, but I am not clear about its functionality.我在 tensorflow API 中找到了以下方法tf.extract_image_patches ,但我不清楚它的功能。

Say the batch_size = 1 , and an image is of size 225x225x3 , and we want to extract patches of size 32x32 .假设batch_size = 1 ,图像大小为225x225x3 ,我们想要提取大小为32x32补丁。

How exactly does this function behave?这个函数的行为究竟如何? Specifically, the documentation mentions the dimension of the output tensor to be [batch, out_rows, out_cols, ksize_rows * ksize_cols * depth] , but what out_rows and out_cols are is not mentioned.具体来说,文档提到输出张量的维度为[batch, out_rows, out_cols, ksize_rows * ksize_cols * depth] ,但没有提到out_rowsout_cols是什么。

Ideally, given an input image tensor of size 1x225x225x3 (where 1 is the batch size), I want to be able to get Kx32x32x3 as output, where K is the total number of patches and 32x32x3 is the dimension of each patch.理想情况下,给定大小为1x225x225x3 (其中 1 是批量大小)的输入图像张量,我希望能够获得Kx32x32x3作为输出,其中K是补丁的总数, 32x32x3是每个补丁的尺寸。 Is there something in tensorflow that already achieves this? tensorflow 中是否有一些东西已经实现了这一点?

Here is how the method works:以下是该方法的工作原理:

  • ksizes is used to decide the dimensions of each patch, or in other words, how many pixels each patch should contain. ksizes用于决定每个补丁的尺寸,或者换句话说,每个补丁应该包含多少像素。
  • strides denotes the length of the gap between the start of one patch and the start of the next consecutive patch within the original image. strides表示原始图像中一个补丁的开始和下一个连续补丁的开始之间的间隙长度。
  • rates is a number that essentially means our patch should jump by rates pixels in the original image for each consecutive pixel that ends up in our patch. rates是一个数字,本质上意味着我们的补丁应该按照原始图像中每个以我们补丁结束的连续像素为单位的像素rates跳跃。 (The example below helps illustrate this.) (下面的例子有助于说明这一点。)
  • padding is either "VALID", which means every patch must be fully contained in the image, or "SAME", which means patches are allowed to be incomplete (the remaining pixels will be filled in with zeroes). padding要么是“VALID”,这意味着每个补丁都必须完全包含在图像中,或者是“SAME”,这意味着补丁可以不完整(剩余的像素将用零填充)。

Here is some sample code with output to help demonstrate how it works:下面是一些带有输出的示例代码,以帮助演示它是如何工作的:

import tensorflow as tf

n = 10
# images is a 1 x 10 x 10 x 1 array that contains the numbers 1 through 100 in order
images = [[[[x * n + y + 1] for y in range(n)] for x in range(n)]]

# We generate four outputs as follows:
# 1. 3x3 patches with stride length 5
# 2. Same as above, but the rate is increased to 2
# 3. 4x4 patches with stride length 7; only one patch should be generated
# 4. Same as above, but with padding set to 'SAME'
with tf.Session() as sess:
  print tf.extract_image_patches(images=images, ksizes=[1, 3, 3, 1], strides=[1, 5, 5, 1], rates=[1, 1, 1, 1], padding='VALID').eval(), '\n\n'
  print tf.extract_image_patches(images=images, ksizes=[1, 3, 3, 1], strides=[1, 5, 5, 1], rates=[1, 2, 2, 1], padding='VALID').eval(), '\n\n'
  print tf.extract_image_patches(images=images, ksizes=[1, 4, 4, 1], strides=[1, 7, 7, 1], rates=[1, 1, 1, 1], padding='VALID').eval(), '\n\n'
  print tf.extract_image_patches(images=images, ksizes=[1, 4, 4, 1], strides=[1, 7, 7, 1], rates=[1, 1, 1, 1], padding='SAME').eval()

Output:输出:

[[[[ 1  2  3 11 12 13 21 22 23]
   [ 6  7  8 16 17 18 26 27 28]]

  [[51 52 53 61 62 63 71 72 73]
   [56 57 58 66 67 68 76 77 78]]]]


[[[[  1   3   5  21  23  25  41  43  45]
   [  6   8  10  26  28  30  46  48  50]]

  [[ 51  53  55  71  73  75  91  93  95]
   [ 56  58  60  76  78  80  96  98 100]]]]


[[[[ 1  2  3  4 11 12 13 14 21 22 23 24 31 32 33 34]]]]


[[[[  1   2   3   4  11  12  13  14  21  22  23  24  31  32  33  34]
   [  8   9  10   0  18  19  20   0  28  29  30   0  38  39  40   0]]

  [[ 71  72  73  74  81  82  83  84  91  92  93  94   0   0   0   0]
   [ 78  79  80   0  88  89  90   0  98  99 100   0   0   0   0   0]]]]

So, for example, our first result looks like the following:因此,例如,我们的第一个结果如下所示:

 *  *  *  4  5  *  *  *  9 10 
 *  *  * 14 15  *  *  * 19 20 
 *  *  * 24 25  *  *  * 29 30 
31 32 33 34 35 36 37 38 39 40 
41 42 43 44 45 46 47 48 49 50 
 *  *  * 54 55  *  *  * 59 60 
 *  *  * 64 65  *  *  * 69 70 
 *  *  * 74 75  *  *  * 79 80 
81 82 83 84 85 86 87 88 89 90 
91 92 93 94 95 96 97 98 99 100 

As you can see, we have 2 rows and 2 columns worth of patches, which are what out_rows and out_cols are.如您所见,我们有 2 行 2 列的补丁,即out_rowsout_cols

To expand on Neal's detailed answer, there are a lot of subtleties with zero padding when using "SAME", since extract_image_patches tries to center the patches in the image if possible.为了扩展 Neal 的详细答案,在使用“SAME”时零填充有很多微妙之处,因为extract_image_patches 会尽可能地将图像中的补丁居中。 Depending on the stride, there may be padding on the top and left, or not, and the first patch doesn't necessarily start in the upper left.根据步幅,顶部和左侧可能有填充,也可能没有,第一个补丁不一定从左上角开始。

For example, extending the previous example:例如,扩展前面的例子:

print tf.extract_image_patches(images, [1, 3, 3, 1], [1, n, n, 1], [1, 1, 1, 1], 'SAME').eval()[0]

With a stride of n=1, the image is padded with zeros all around and the first patch starts with padding.当步长为 n=1 时,图像四周用零填充,第一个补丁从填充开始。 Other strides pad the image only on the right and bottom, or not at all.其他步幅仅在右侧和底部填充图像,或者根本不填充。 With a stride of n=10, the single patch starts at element 34 (in the middle of the image).当步长为 n=10 时,单个补丁从元素 34(在图像的中间)开始。

tf.extract_image_patches is implemented by the eigen library as described in this answer . tf.extract_image_patches 由本答案中所述的特征库实现。 You can study that code to see exactly how patch positions and padding are computed.您可以研究该代码以准确了解补丁位置和填充是如何计算的。

Introduction简介

Here I would like to present a rather simple demonstration to use the tf.image.extract_patches with images itself.在这里,我想展示一个相当简单的演示,以将tf.image.extract_patches图像本身一起使用。 I have found a rather small amount of implementation of the method with actual images with the proper visualizations, so here it is.我发现该方法的实现量相当小,使用具有适当可视化的实际图像,所以就在这里。

The image we will use is of size (256, 256, 3).我们将使用的图像大小为 (256, 256, 3)。 The patches we will be extracting will be shaped (128, 128, 3).我们将提取的补丁的形状为 (128, 128, 3)。 This means that we will retrieve 4 tiles from the image.这意味着我们将从图像中检索 4 个图块。

Data used使用的数据

I will be using the flowers dataset .我将使用花数据集 Due to the fact that this answer needs a little data pipeline, I will be linking my kaggle kernel here which talks about consuming the dataset with tf.data.Dataset API.由于这个答案需要一点数据管道,我将在这里链接我的kaggle 内核,它讨论使用tf.data.Dataset API 使用数据集。

After we are through we go through the following code snippets.完成后,我们将浏览以下代码片段。

images, _ = next(iter(train_ds.take(1)))

image = images[0]
plt.imshow(image.numpy().astype("uint8"))

花

Here we are taking one image from the batch of images and visualizing it as is.在这里,我们从一批图像中取出一张图像并按原样对其进行可视化。

image = tf.expand_dims(image,0) # To create the batch information
patches = tf.image.extract_patches(images=image,
                                   sizes=[1, 128, 128, 1],
                                   strides=[1, 128, 128, 1],
                                   rates=[1, 1, 1, 1],
                                   padding='VALID')

With this snippet, we are extracting patches of size (128,128) from the image of size (256,256).使用这个片段,我们从大小为 (256,256) 的图像中提取大小为 (128,128) 的块。 This directly translates to the fact that I would want the images to be split into 4 tiles.这直接转化为我希望将图像分成 4 个图块的事实。

Visualization可视化

plt.figure(figsize=(10, 10))
for imgs in patches:
    count = 0
    for r in range(2):
        for c in range(2):
            ax = plt.subplot(2, 2, count+1)
            plt.imshow(tf.reshape(imgs[r,c],shape=(128,128,3)).numpy().astype("uint8"))
            count += 1

花的裂痕

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM