简体   繁体   中英

Black output with image manipulation in Tensorflow (using a jpeg decoder for Neural net training)

I have access to a large amount of 2048x2048x3 jpeg pictures which I am storing in the TFRecords binary format. Later, I use the stored files to train a deep neural network. To store the pictures, I am currently using two different methods.

The first one uses tensorflow. I have defined a function that creates a Tensorflow graph. I keep reusing the same graph for all the pictures:

def picture_decoder(height, width):
    g = tf.Graph()
    with g.as_default():
        picture_name_tensor = tf.placeholder(tf.string)
        picture_contents = tf.read_file(picture_name_tensor)
        picture = tf.image.decode_jpeg(picture_contents)
        picture_as_float = tf.image.convert_image_dtype(picture, tf.float32)
        picture_4d = tf.expand_dims(picture_as_float, 0)
        resize_shape = tf.stack([height, width])
        resize_shape_as_int = tf.cast(resize_shape, dtype=tf.int32)
        final_tensor =  tf.image.resize_bilinear(picture_4d, resize_shape_as_int)
    return g, picture_name_tensor, final_tensor

Height, Width = 300, 300
graph, nameholder, image_tensor = picture_decoder(Height, Width)                                        
with tf.Session(graph=graph) as sess:
    init = tf.group( tf.global_variables_initializer(), tf.local_variables_initializer() )
    sess.run(init)

    #Loop through the  pictures
    for(...picture_name...):
        picture = sess.run(image_tensor, feed_dict={nameholder: picture_name} )    

The second method uses numpy:

def picture_decoder_numpy(picture_name, height, width):
    image = Image.open(picture_name)
    image = image.resize((height,width), Image.LANCZOS)
    image = np.array(image, dtype=np.int32)                             
    return np.expand_dims(image, axis=0)

Heigth, Width = 300, 300
for(...picture_name...):
    picture = picture_decoder_numpy(pict, Height, Width)

The first method appears to be approximately 6 times faster than the second one.

The issue I am facing is related with the training afterwards. For the first case, the deep neural net I have defined does not learn, ie, its loss does not improve over many epochs and it is only slightly smaller than 1. Using the second method, without changing any neural net parameter , the loss achieves E-05 values. Am I missing some Tensorflow detail?

I can post the full code if necessary.

Update:

The method using Tensorflow outputs a black picture, while the method using numpy works as expected.

MVCE for decoding the pictures:

from PIL import Image
import numpy as np
import tensorflow as tf

def picture_decoder(height, width):
    g = tf.Graph()
    with g.as_default():
        picture_name_tensor = tf.placeholder(tf.string)
        picture_contents = tf.read_file(picture_name_tensor)
        picture = tf.image.decode_jpeg(picture_contents, dct_method="INTEGER_ACCURATE")
        picture_as_float = tf.image.convert_image_dtype(picture, tf.float32)
        picture_4d = tf.expand_dims(picture_as_float, 0)
        resize_shape = tf.stack([height, width])
        resize_shape_as_int = tf.cast(resize_shape, dtype=tf.int32)
        final_tensor =  tf.squeeze(tf.image.resize_bilinear(picture_4d, resize_shape_as_int))
    return g, picture_name_tensor, final_tensor

def picture_decoder_numpy(picture_name, height, width):
    image = Image.open(picture_name)
    image = image.resize((height,width), Image.LANCZOS)
    return np.array(image, dtype=np.int32)


pic_name = "picture.jpg"
#Numpy method                                                                                            
#picture = picture_decoder_numpy(pic_name, 300, 300)                                                     

#Tensorflow method                                                                                       
graph, nameholder, picture_tensor = picture_decoder(300, 300)
with tf.Session(graph=graph) as sess:
    init = tf.group()
    sess.run(init)
    picture = sess.run(picture_tensor, feed_dict={nameholder: pic_name})

im = Image.fromarray(picture.astype('uint8'))
im.save("save.jpg")

The TF implementation does not do what you think it does. The problem is that the image value get converted to the (1, 0) range, while in the numpy way, values are in the (255, 0) range.

One way to solve it is to multiply your final result by 255 .

def picture_decoder(height, width):
    g = tf.Graph()
    with g.as_default():
        picture_name_tensor = tf.placeholder(tf.string)
        picture_contents = tf.read_file(picture_name_tensor)
        picture = tf.image.decode_jpeg(picture_contents, dct_method="INTEGER_ACCURATE")
        picture_as_float = tf.image.convert_image_dtype(picture, tf.float32)
        picture_4d = tf.expand_dims(picture_as_float, 0)
        resize_shape = tf.stack([height, width])
        resize_shape_as_int = tf.cast(resize_shape, dtype=tf.int32)
        final_tensor =  tf.squeeze(tf.image.resize_bilinear(picture_4d, resize_shape_as_int)) * 255  # FIX: rescale to typical 8-bit range
    return g, picture_name_tensor, final_tensor

Of course, the two array should not match exactly as you use two different interpolation methods as well.

norm_dist = np.abs(np.sum(arr - arr2)) / (np.sum(arr) + np.sum(arr2)) / 2
np.isclose(norm_dist, 0, atol=1e-4)
True

(assuming arr contains the numpy implementation, and arr2 the tensorflow one).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM