简体   繁体   中英

Tensorflow CNN does not learn (image in - image out)

i'm stuck working on a Tensorflow Convolutional Neural Network for a university project and i hope somebody can help me.
it's supposed to output a picture for a picture input. left is input, right is output. both are in .jpeg format.

input and output

The weights look like this. left image shows the weights before learning, right is after a few epochs and it does not change at all with further training.
The net does not seem to learn anything useful and i have a feeling i forgot something basic. the accuracy peeks around 5% when learning

weights

here is what it looks when i save the input image x
i dont know if i make a mistake loading or saving the image

And this is what the output y of the net looks like

i based the code on the tensorflow mnist tutorial. here is my code that i have shortened to make it more readable:

import tensorflow as tf
from PIL import Image
import numpy as np

def weight_variable(dim,stddev=0.35):
    init = tf.random_normal(dim, stddev=stddev)
    return tf.Variable(init)

def bias_variable(dim,val=0.1):
    init = tf.constant(val, shape=dim)
    return tf.Variable(init)

def conv2d(x,W):
    return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding = 'SAME')

def max_pool2x2(x):
    return tf.nn.max_pool(x, ksize=[1,2,2,1], strides=[1,2,2,1], padding = 'SAME')

def output_pics(pic): # for weights
    #1 color (dimension) array cast to uint8 and output as jpeg to file

def output_pics_color(pic):
    #3 colors (dimensions) array cast to uint8 and output as jpeg to file

def show_pic(pic):
    #3 colors (dimensions) array cast to uint8 and shown in window



filesX = [...] # filenames of inputs for training
filesY = [...] # filenames of outputsfor training
test_filesX = [...]# filenames of inputs for testing
test_filesY = [...]# filenames of outputs for testing
px_size = 128 # size of images 128x128 (resized)


filename_queueX = tf.train.string_input_producer(filesX)
filename_queueY = tf.train.string_input_producer(filesY)
filename_testX = tf.train.string_input_producer(test_filesY)
filename_testY = tf.train.string_input_producer(test_filesY)

image_reader = tf.WholeFileReader()
img_name, img_dataX = image_reader.read(filename_queueX)
imageX = tf.image.decode_jpeg(img_dataX)
imageX = tf.image.resize_images(imageX, [px_size,px_size])
imageX.set_shape((px_size,px_size,3))
imageX=tf.cast(imageX, tf.float32)

...
same for imageY, test_imageX, test_imageY

trainX = []
trainY = []
testX = []
testY = []
j=1


with tf.name_scope('model'):
    x=tf.placeholder(tf.float32, [None, px_size,px_size,3])
    prob = tf.placeholder(tf.float32)

    init_op = tf.global_variables_initializer()

    # load images into lists
    with tf.Session() as sess:
        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coord)
        for i in range(1,65):
            trainX.append(imageX.eval())
            trainY.append(imageY.eval())
        for i in range(1, 10):
            testX.append(test_imageX.eval())
            testY.append(test_imageY.eval())
        coord.request_stop()
        coord.join(threads)    

    # layer 1 
    x_img = tf.reshape(x,[-1,px_size,px_size, 3])    
    W1 = weight_variable([20,20,3,3])
    b1 = bias_variable([3])                       
    y1 = tf.nn.softmax(conv2d(x_img,W1)+b1)

    # layer 2
    W2 = weight_variable([30,30,3,3])
    b2 = bias_variable([3])
    y2=tf.nn.softmax(conv2d(y1, W2)+b2)

    # layer 3
    W3 = weight_variable([40,40,3,3])
    b3 = bias_variable([3])
    y3=tf.nn.softmax(conv2d(y2, W3)+b3)


    y = y3

    with tf.name_scope('train'):
        y_ =tf.placeholder(tf.float32, [None, px_size,px_size,3])
        cross_entropy = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=y_, logits=y))
        opt = tf.train.MomentumOptimizer(learning_rate=0.5, momentum=0.1).minimize(cross_entropy)

    with tf.name_scope('eval'):
        correct = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
        accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

        nEpochs = 1000
        batchSize = 10
        res = 0
        with tf.Session() as sess:
            init = tf.global_variables_initializer()
            sess.run(init)
            trAccs = []
            for i in range(nEpochs):
                if i%100 == 0 :
                    train_accuracy = sess.run(accuracy, feed_dict={x:trainX, y_:trainY, prob: 1.0})
                    print(train_accuracy)
                    output_pics(W1)#output weights of layer 1 to file
                    output_pics_color(x)#save input image
                    output_pics_color(y)#save net output
                    sess.run(opt, feed_dict={x:trainX, y_:trainY, prob: 0.5})
  • This is an Image generation problem
  • The model you selected is a very bad model for Image generation tasks
  • Normal CNNs are used for image recognition and object detection tasks
  • The tutorial on MNIST is image classification problem and not image generation problem
  • It is very important to select an appropriate model type for a particular problem
  • Clearly with this model there is no chance of achieving the output that you have mentioned
  • I do not event understand that how are you even calculating the accuracy because this is unsupervised learning problem
  • You have used softmax after every layer which is really a bad idea.. Tensorflow mnist tutorial does not even has this code
  • Softmax is only used in the last layer
  • In the hidden layers leaky relu or simple relu should be used
  • I would suggest you to look for a more appropriate deep-learning model
  • Specifically combination of Variational Auto-Encoder Generative Adversarial Networks or simple Generative Adversarial Networks

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM