I have access to a large amount of 2048x2048x3 jpeg pictures which I am storing in the TFRecords binary format. Later, I use the stored files to train a deep neural network. To store the pictures, I am currently using two different methods.
The first one uses tensorflow. I have defined a function that creates a Tensorflow graph. I keep reusing the same graph for all the pictures:
def picture_decoder(height, width):
g = tf.Graph()
with g.as_default():
picture_name_tensor = tf.placeholder(tf.string)
picture_contents = tf.read_file(picture_name_tensor)
picture = tf.image.decode_jpeg(picture_contents)
picture_as_float = tf.image.convert_image_dtype(picture, tf.float32)
picture_4d = tf.expand_dims(picture_as_float, 0)
resize_shape = tf.stack([height, width])
resize_shape_as_int = tf.cast(resize_shape, dtype=tf.int32)
final_tensor = tf.image.resize_bilinear(picture_4d, resize_shape_as_int)
return g, picture_name_tensor, final_tensor
Height, Width = 300, 300
graph, nameholder, image_tensor = picture_decoder(Height, Width)
with tf.Session(graph=graph) as sess:
init = tf.group( tf.global_variables_initializer(), tf.local_variables_initializer() )
sess.run(init)
#Loop through the pictures
for(...picture_name...):
picture = sess.run(image_tensor, feed_dict={nameholder: picture_name} )
The second method uses numpy:
def picture_decoder_numpy(picture_name, height, width):
image = Image.open(picture_name)
image = image.resize((height,width), Image.LANCZOS)
image = np.array(image, dtype=np.int32)
return np.expand_dims(image, axis=0)
Heigth, Width = 300, 300
for(...picture_name...):
picture = picture_decoder_numpy(pict, Height, Width)
The first method appears to be approximately 6 times faster than the second one.
The issue I am facing is related with the training afterwards. For the first case, the deep neural net I have defined does not learn, ie, its loss does not improve over many epochs and it is only slightly smaller than 1. Using the second method, without changing any neural net parameter , the loss achieves E-05 values. Am I missing some Tensorflow detail?
I can post the full code if necessary.
Update:
The method using Tensorflow outputs a black picture, while the method using numpy works as expected.
MVCE for decoding the pictures:
from PIL import Image
import numpy as np
import tensorflow as tf
def picture_decoder(height, width):
g = tf.Graph()
with g.as_default():
picture_name_tensor = tf.placeholder(tf.string)
picture_contents = tf.read_file(picture_name_tensor)
picture = tf.image.decode_jpeg(picture_contents, dct_method="INTEGER_ACCURATE")
picture_as_float = tf.image.convert_image_dtype(picture, tf.float32)
picture_4d = tf.expand_dims(picture_as_float, 0)
resize_shape = tf.stack([height, width])
resize_shape_as_int = tf.cast(resize_shape, dtype=tf.int32)
final_tensor = tf.squeeze(tf.image.resize_bilinear(picture_4d, resize_shape_as_int))
return g, picture_name_tensor, final_tensor
def picture_decoder_numpy(picture_name, height, width):
image = Image.open(picture_name)
image = image.resize((height,width), Image.LANCZOS)
return np.array(image, dtype=np.int32)
pic_name = "picture.jpg"
#Numpy method
#picture = picture_decoder_numpy(pic_name, 300, 300)
#Tensorflow method
graph, nameholder, picture_tensor = picture_decoder(300, 300)
with tf.Session(graph=graph) as sess:
init = tf.group()
sess.run(init)
picture = sess.run(picture_tensor, feed_dict={nameholder: pic_name})
im = Image.fromarray(picture.astype('uint8'))
im.save("save.jpg")
The TF implementation does not do what you think it does. The problem is that the image value get converted to the (1, 0)
range, while in the numpy
way, values are in the (255, 0)
range.
One way to solve it is to multiply your final result by 255
.
def picture_decoder(height, width):
g = tf.Graph()
with g.as_default():
picture_name_tensor = tf.placeholder(tf.string)
picture_contents = tf.read_file(picture_name_tensor)
picture = tf.image.decode_jpeg(picture_contents, dct_method="INTEGER_ACCURATE")
picture_as_float = tf.image.convert_image_dtype(picture, tf.float32)
picture_4d = tf.expand_dims(picture_as_float, 0)
resize_shape = tf.stack([height, width])
resize_shape_as_int = tf.cast(resize_shape, dtype=tf.int32)
final_tensor = tf.squeeze(tf.image.resize_bilinear(picture_4d, resize_shape_as_int)) * 255 # FIX: rescale to typical 8-bit range
return g, picture_name_tensor, final_tensor
Of course, the two array should not match exactly as you use two different interpolation methods as well.
norm_dist = np.abs(np.sum(arr - arr2)) / (np.sum(arr) + np.sum(arr2)) / 2
np.isclose(norm_dist, 0, atol=1e-4)
True
(assuming arr
contains the numpy
implementation, and arr2
the tensorflow
one).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.