Tensorflow - 有没有办法实现张量方式的图像剪切/旋转/平移？

Question

I am trying to do different kinds of (image) data augmentation for training my neural network. 我正在尝试进行不同类型的（图像）数据增强来训练我的神经网络。

I know that tf.image offers some augmentation functions, but they are too simple - for example, I can only rotate the image by 90 degree, instead of any degree. 我知道tf.image提供了一些增强功能，但它们太简单了 - 例如，我只能将图像旋转90度，而不是任何程度。

I also know that tf.keras.preprocessing.image offers random rotation, random shear, random shift and random zoom. 我也知道tf.keras.preprocessing.image提供随机旋转，随机剪切，随机移位和随机缩放。 However these methods can only be applied on numpy array, instead of tensor. 但是这些方法只能应用于numpy数组，而不是张量。

I know I can read the images first, use functions from tf.keras.preprocessing.image to do the augmentation, and then convert these augmented numpy arrays to tensors. 我知道我可以先读取图像，使用tf.keras.preprocessing.image中的函数进行扩充，然后将这些扩充的numpy数组转换为张量。

However, I just wonder whether there is a way that I can implement tensor-wise augmentations, so that I don't need to bother with the "image file -> tensor -> numpy array -> tensor" procedure. 但是，我只是想知道是否有一种方法可以实现张量增强，所以我不需要打扰“图像文件 - >张量 - > numpy数组 - >张量”程序。

Update for those who want to know how to apply your transform: 想要了解如何应用转换的用户的更新：

For detailed source code, you may want to check tf.contrib.image.transform and tf.contrib.image.matrices_to_flat_transforms . 有关详细的源代码，您可能需要检查tf.contrib.image.transform和tf.contrib.image.matrices_to_flat_transforms 。

here is my code: 这是我的代码：

def transformImg(imgIn,forward_transform):
    t = tf.contrib.image.matrices_to_flat_transforms(tf.linalg.inv(forward_transform))
    # please notice that forward_transform must be a float matrix,
    # e.g. [[2.0,0,0],[0,1.0,0],[0,0,1]] will work
    # but [[2,0,0],[0,1,0],[0,0,1]] will not
    imgOut = tf.contrib.image.transform(imgIn, t, interpolation="BILINEAR",name=None)
    return imgOut

Basically, the code above is doing 基本上，上面的代码正在做

for every point (x,y) in imgIn . 对于imgIn每个点（x，y）。

A shear transform parallel to the x axis, for example , is 例如，平行于x轴的剪切变换是

Therefore, we can implement shear transform like this (using transformImg() defined above): 因此，我们可以像这样实现剪切变换（使用上面定义的transformImg() ）：

def shear_transform_example(filename,shear_lambda):
    image_string = tf.read_file(filename)
    image_decoded = tf.image.decode_jpeg(image_string, channels=3)
    img = transformImg(image_decoded, [[1.0,shear_lambda,0],[0,1.0,0],[0,0,1.0]])
    return img
img = shear_transform_example("white_square.jpg",0.1)

Original image: 原始图片：

After transform: 改造后：

(Please notice that img is a tensor, codes to convert tensors to image files are not included.) （请注意， img是张量，不包括将张量转换为图像文件的代码。）

PS PS

The above codes work on tensorflow 1.10.1, and might not work on future versions. 以上代码适用于tensorflow 1.10.1，可能不适用于将来的版本。

To be honest, I really don't know why they designed tf.contrib.image.transform in a way that we have to use another function(tf.linalg.inv) to get what we want. 说实话，我真的不知道他们为什么设计tf.contrib.image.transform，我们必须使用另一个函数（tf.linalg.inv）来获得我们想要的东西。 I really hope they can change tf.contrib.image.transform to work in a more intuitive way . 我真的希望他们能够以更直观的方式改变tf.contrib.image.transform。

Answer 1

Have a look at tf.contrib.image.transform . 看看tf.contrib.image.transform 。 It enables applying general projective transforms to an image. 它可以将一般投影变换应用于图像。

You will also need to have a look to tf.contrib.image.matrices_to_flat_transforms to transform your affine matrices into the projective format accepted by tf.contrib.image.transform . 您还需要查看tf.contrib.image.matrices_to_flat_transforms ，将您的仿射矩阵转换为tf.contrib.image.transform接受的投影格式。

Answer 2

I usually use tf.data.Dataset s with Dataset.map and tf.py_func . 我通常使用tf.data.Dataset s的Dataset.map和tf.py_func 。 Dataset.prefetch means there's usually no time cost (so long as preprocessing on CPU takes less time than running your network on GPU). Dataset.prefetch意味着通常没有时间成本（只要CPU上的预处理比在GPU上运行网络所花费的时间少）。 If you're operating across multiple GPUs you may want to reconsider, but the following works well for me on single GPU systems. 如果您在多个GPU上运行，您可能需要重新考虑，但以下适用于单GPU系统的情况。

For simplicity I'll assume you have all your images on disk in separate files, though it can easily be adapted for zip archives or other formats like hdf5 (won't work for .tar files - not sure why, but I doubt it would be a good idea anyway.) 为简单起见，我假设您将所有图像都放在磁盘上的单独文件中，虽然它可以很容易地适用于zip存档或其他格式，如hdf5（不适用于.tar文件 - 不知道为什么，但我怀疑它会无论如何，这是一个好主意。）

import tensorflow as tf
from PIL import Image


def map_tf(path_tensor, label_tensor):
    # path_tensor and label_tensor correspond to a single example

    def map_np(path_str):
        # path_str is just a normal string here
        image = np.array(Image.load(path_str), dtype=np.uint8)
        image = any_cv2_or_numpy_augmentations(image)
        return image,

    image, = tf.py_func(
        map_np, (path_tensor,), Tout=(tf.uint8,), stateful=False)
    # any tensorflow operations here.
    image = tf.cast(image, tf.float32) / 255

    image.set_shape((224, 224, 3))
    return image, label


paths, labels = load_image_paths_and_labels()
dataset = tf.data.Dataset.from_tensor_slices((paths, labels))
if is_training:
    shuffle_buffer = len(paths)  # full shuffling - can be shorter
    dataset = dataset.shuffle(shuffle_buffer).repeat()
dataset = dataset.map(map_tf_fn, num_parallel_calls=8)
dataset = dataset.batch(batch_size)

dataset = dataset.prefetch(1)
# play with the following if you want - not finalized API, and only in
# more recent version of tensorflow
# dataset = dataset.apply(tf.contrib.data.prefetch_to_device('/gpu:0'))

image_batch, label_batch = dataset.make_one_shot_iterator().get_next()

You could also do the decoding in tensorflow and use any_cv2_or_numpy_augmentations directly in py_func (though you don't avoid the tensor -> numpy -> tensor dance you mention in your question). 你也可以在tensorflow中进行解码并直接在py_func使用any_cv2_or_numpy_augmentations （尽管你没有避免在你的问题中提到的张量 - > numpy - > tensor dance）。 I doubt you'll notice a performance difference either way. 我怀疑你会注意到性能差异。

Check this answer for more options. 查看此答案以获取更多选项。

Tensorflow - 有没有办法实现张量方式的图像剪切/旋转/平移？

问题描述

2 个解决方案

解决方案1
4 已采纳 2018-09-07 08:04:48

解决方案2
0 2018-09-07 06:15:53

Tensorflow - 有没有办法实现张量方式的图像剪切/旋转/平移？

问题描述

2 个解决方案

解决方案1 4 已采纳 2018-09-07 08:04:48

解决方案2 0 2018-09-07 06:15:53

解决方案1
4 已采纳 2018-09-07 08:04:48

解决方案2
0 2018-09-07 06:15:53