简体   繁体   中英

Tensorflow - Is there a way to implement tensor-wise image shear/rotation/translation?

I am trying to do different kinds of (image) data augmentation for training my neural network.

I know that tf.image offers some augmentation functions, but they are too simple - for example, I can only rotate the image by 90 degree, instead of any degree.

I also know that tf.keras.preprocessing.image offers random rotation, random shear, random shift and random zoom. However these methods can only be applied on numpy array, instead of tensor.

I know I can read the images first, use functions from tf.keras.preprocessing.image to do the augmentation, and then convert these augmented numpy arrays to tensors.

However, I just wonder whether there is a way that I can implement tensor-wise augmentations, so that I don't need to bother with the "image file -> tensor -> numpy array -> tensor" procedure.


Update for those who want to know how to apply your transform:

For detailed source code, you may want to check tf.contrib.image.transform and tf.contrib.image.matrices_to_flat_transforms .

here is my code:

def transformImg(imgIn,forward_transform):
    t = tf.contrib.image.matrices_to_flat_transforms(tf.linalg.inv(forward_transform))
    # please notice that forward_transform must be a float matrix,
    # e.g. [[2.0,0,0],[0,1.0,0],[0,0,1]] will work
    # but [[2,0,0],[0,1,0],[0,0,1]] will not
    imgOut = tf.contrib.image.transform(imgIn, t, interpolation="BILINEAR",name=None)
    return imgOut

Basically, the code above is doing

在此输入图像描述 for every point (x,y) in imgIn .

A shear transform parallel to the x axis, for example , is

在此输入图像描述

Therefore, we can implement shear transform like this (using transformImg() defined above):

def shear_transform_example(filename,shear_lambda):
    image_string = tf.read_file(filename)
    image_decoded = tf.image.decode_jpeg(image_string, channels=3)
    img = transformImg(image_decoded, [[1.0,shear_lambda,0],[0,1.0,0],[0,0,1.0]])
    return img
img = shear_transform_example("white_square.jpg",0.1)

Original image: 在此输入图像描述

After transform: 在此输入图像描述

(Please notice that img is a tensor, codes to convert tensors to image files are not included.)

PS

The above codes work on tensorflow 1.10.1, and might not work on future versions.

To be honest, I really don't know why they designed tf.contrib.image.transform in a way that we have to use another function(tf.linalg.inv) to get what we want. I really hope they can change tf.contrib.image.transform to work in a more intuitive way .

Have a look at tf.contrib.image.transform . It enables applying general projective transforms to an image.

You will also need to have a look to tf.contrib.image.matrices_to_flat_transforms to transform your affine matrices into the projective format accepted by tf.contrib.image.transform .

I usually use tf.data.Dataset s with Dataset.map and tf.py_func . Dataset.prefetch means there's usually no time cost (so long as preprocessing on CPU takes less time than running your network on GPU). If you're operating across multiple GPUs you may want to reconsider, but the following works well for me on single GPU systems.

For simplicity I'll assume you have all your images on disk in separate files, though it can easily be adapted for zip archives or other formats like hdf5 (won't work for .tar files - not sure why, but I doubt it would be a good idea anyway.)

import tensorflow as tf
from PIL import Image


def map_tf(path_tensor, label_tensor):
    # path_tensor and label_tensor correspond to a single example

    def map_np(path_str):
        # path_str is just a normal string here
        image = np.array(Image.load(path_str), dtype=np.uint8)
        image = any_cv2_or_numpy_augmentations(image)
        return image,

    image, = tf.py_func(
        map_np, (path_tensor,), Tout=(tf.uint8,), stateful=False)
    # any tensorflow operations here.
    image = tf.cast(image, tf.float32) / 255

    image.set_shape((224, 224, 3))
    return image, label


paths, labels = load_image_paths_and_labels()
dataset = tf.data.Dataset.from_tensor_slices((paths, labels))
if is_training:
    shuffle_buffer = len(paths)  # full shuffling - can be shorter
    dataset = dataset.shuffle(shuffle_buffer).repeat()
dataset = dataset.map(map_tf_fn, num_parallel_calls=8)
dataset = dataset.batch(batch_size)

dataset = dataset.prefetch(1)
# play with the following if you want - not finalized API, and only in
# more recent version of tensorflow
# dataset = dataset.apply(tf.contrib.data.prefetch_to_device('/gpu:0'))

image_batch, label_batch = dataset.make_one_shot_iterator().get_next()

You could also do the decoding in tensorflow and use any_cv2_or_numpy_augmentations directly in py_func (though you don't avoid the tensor -> numpy -> tensor dance you mention in your question). I doubt you'll notice a performance difference either way.

Check this answer for more options.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM