简体   繁体   English

Tensorflow CNN模型总是预测相同的类

[英]Tensorflow CNN model always predicts same class

I have been trying to develop a CNN model for image classification. 我一直在尝试开发用于图像分类的CNN模型。 I am new to tensorflow and getting help from the following books 我是tensorflow的新手并从以下书籍中获得帮助

  • Learning.TensorFlow.A.Guide.to.Building.Deep.Learning.Systems Learning.TensorFlow.A.Guide.to.Building.Deep.Learning.Systems

  • TensorFlow For Machine Intelligence by Sam Abrahams 机器智能的TensorFlow,作者Sam Abrahams

For the past few weeks I have been working to develop a good model but I always get the same prediction. 在过去的几周中,我一直在努力开发一个好的模型,但是我总是得到相同的预测。 I have tried many different architectures but no luck! 我尝试了许多不同的体系结构,但没有运气!

Lately I decided to test my model with CIFAR-10 dataset and using the exact same model as given in the Learning Tensorflow book. 最近,我决定使用CIFAR-10数据集并使用与Learning Tensorflow一书中给出的完全相同的模型来测试我的模型。 But the outcome was same (same class for every image) even after training for 50K steps. 但是,即使经过5万步的训练,结果还是一样的(每个图像都相同)。

Here is highlight of my model and code. 这是我的模型和代码的重点。

1.) Downloaded CIFAR-10 image sets, converted them into tfrecord files with labels(labels are string for each category of CIFAR-10 in the tfrecord file) each for training and test set. 1.)下载CIFAR-10图像集,将它们转换成带有标签(tfrecord文件中CIFAR-10的每个类别的标签是字符串)的tfrecord文件,分别用于训练和测试集。

2) Reading the images from tfrecord file and generating random shuffle batch of size 100. 2)从tfrecord文件读取图像并生成大小为100的随机随机播放批次。

3) Converting the label from string to the integer32 type from 0-9 each for given category 3)将给定类别的标签从字符串转换为整数,类型从0-9分别为0-9

4) Pass the training and test batches to the network and getting the output of [batch_size , num_class] size. 4)将训练和测试批次传递到网络,并获得[batch_size,num_class] size的输出。

5) Train the model using Adam optimizer and softmax cross entropy loss function (Have tried gradient optimizer as well) 5)使用Adam优化器和softmax交叉熵损失函数训练模型(也尝试过梯度优化器)

7) evaluate the model for test batches before and after the training. 7)在训练前后评估测试批次的模型。

8) Getting the same prediction for entire data set (But different every time I re run the code to try again) 8)对整个数据集获得相同的预测(但是每次我重新运行代码以重试时都不同)

Is there something wrong I am doing here? 我在这里做错什么吗? I would appreciate if someone could help me out with this problem. 如果有人可以帮助我解决此问题,我将不胜感激。

Note - My approach of converting images and labels into tfrecord could be unusual but believe me I have come up with this idea from the books I mentioned earlier. 注意-我将图像和标签转换为tfrecord的方法可能很不寻常,但相信我已经从我之前提到的书中想到了这个想法。

My code for the problem: 我的问题代码:

import tensorflow as tf
import numpy as np
import _datetime as dt
import PIL

# The glob module allows directory listing
import glob
import random

from itertools import groupby
from collections import defaultdict

H , W  = 32 , 32        # Height and weight of the image
C = 3                   # Number of channels


sessInt = tf.InteractiveSession()

# Read file and return the batches of the input data
def get_Batches_From_TFrecord(tf_record_filenames_list, batch_size):
    # Match and load all the tfrecords found in the specified directory
    tf_record_filename_queue = tf.train.string_input_producer(tf_record_filenames_list)

    # It may have more than one example in them.
    tf_record_reader = tf.TFRecordReader()
    tf_image_name, tf_record_serialized = tf_record_reader.read(tf_record_filename_queue)

    # The label and image are stored as bytes but could be stored as int64 or float64 values in a
    # serialized tf.Example protobuf.
    tf_record_features = tf.parse_single_example(tf_record_serialized,
                                                 features={'label': tf.FixedLenFeature([], tf.string),
                                                           'image': tf.FixedLenFeature([], tf.string), })

    # Using tf.uint8 because all of the channel information is between 0-255
    tf_record_image = tf.decode_raw(tf_record_features['image'], tf.uint8)

    try:
        # Reshape the image to look like the input image
        tf_record_image = tf.reshape(tf_record_image, [H, W, C])

    except:
        print(tf_image_name)

    tf_record_label = tf.cast(tf_record_features['label'], tf.string)

    '''
    #Check the image and label

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sessInt, coord=coord)

    label = tf_record_label.eval().decode()
    print(label)

    image = PIL.Image.fromarray(tf_record_image.eval())
    image.show()

    coord.request_stop()
    coord.join(threads)
    '''

    # creating a batch to feed the data

    min_after_dequeue = 10 * batch_size
    capacity = min_after_dequeue + 5 * batch_size

    # Shuffle examples while feeding in the queue
    image_batch, label_batch = tf.train.shuffle_batch([tf_record_image, tf_record_label], batch_size=batch_size,
                                                      capacity=capacity, min_after_dequeue=min_after_dequeue)

    # Sequential feed in the examples in the queue (Don't shuffle)
    # image_batch, label_batch = tf.train.batch([tf_record_image, tf_record_label], batch_size=batch_size, capacity=capacity)

    # Converting the images to a float to match the expected input to convolution2d
    float_image_batch = tf.image.convert_image_dtype(image_batch, tf.float32)

    string_label_batch = label_batch

    return float_image_batch, string_label_batch

#Count the number of images in the tfrecord file

def number_of_records(tfrecord_file_name):
    count = 0
    record_iterator = tf.python_io.tf_record_iterator(path = tfrecord_file_name)
    for record in record_iterator:
        count+=1

    return count

def get_num_of_samples(tfrecords_list):
    total_samples = 0
    for tfrecord in tfrecords_list:
        total_samples += number_of_records(tfrecord)

    return total_samples

# Provide the input tfrecord names in a list
train_filenames = ["./TFRecords/cifar_train.tfrecord"]
test_filename = ["./TFRecords/cifar_test.tfrecord"]

num_train_samples = get_num_of_samples(train_filenames)
num_test_samples = get_num_of_samples(test_filename)


print("Number of Training samples: ", num_train_samples)
print("Number of Test samples: ", num_test_samples)


''' 
IMP Note : (Batch_size * Training_Steps) should be at least greater than (2*Number_of_samples) for shuffling of batches

'''
train_batch_size = 100

# Total number of batches for input records
# Note - Num of samples in the tfrecord file can be determined by the tfrecord iterator.

# Batch size for test samples
test_batch_size = 50

train_image_batch, train_label_batch = get_Batches_From_TFrecord(train_filenames, train_batch_size)
test_image_batch, test_label_batch = get_Batches_From_TFrecord(test_filename, test_batch_size)


#  Definition of the convolution network which returns a single neuron for each input image in the batch


# Define a placeholder for keep probability in dropout
# (Dropout should only use while training, for testing dropout should be always 1.0)

fc_prob = tf.placeholder(tf.float32)
conv_prob = tf.placeholder(tf.float32)

#Helper function to add learned filters(images) into tensorboard summary - for a random input in the batch 
def add_filter_summary(name, filter_tensor):

    rand_idx = random.randint(0,filter_tensor.get_shape()[0]-1)  #Choose any random number from[0,batch_size)

    #dispay_filter = filter_tensor[random.randint(0,filter_tensor.get_shape()[3])]

    dispay_filter = filter_tensor[5]        #keeping the index fix for consistency in visualization

    with tf.name_scope("Filter_Summaries"):
        img_summary = tf.summary.image(name, tf.reshape(dispay_filter,[-1 , filter_tensor.get_shape()[1],filter_tensor.get_shape()[1],1] ), max_outputs = 500)


# Helper functions for the network

def weight_initializer(shape):
    weights = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(weights)


def bias_initializer(shape):
    biases = tf.constant(0.1, shape=shape)
    return tf.Variable(biases)


def conv2d(input, weights, stride):
    return tf.nn.conv2d(input, filter=weights, strides=[1, stride, stride, 1], padding="SAME")


def pool_layer(input, window_size=2 , stride=2):
    return tf.nn.max_pool(input, ksize=[1, window_size, window_size, 1], strides=[1, stride, stride, 1], padding='VALID')


# This is the actual layer we will use.
# Linear convolution as defined in conv2d, with a bias,
# followed by the ReLU nonlinearity.
def conv_layer(input, filter_shape , stride=1):
    W = weight_initializer(filter_shape)
    b = bias_initializer([filter_shape[3]])
    return tf.nn.relu(conv2d(input, W, stride) + b)


# A standard full layer with a bias. Notice that here we didn’t add the ReLU.
# This allows us to use the same layer for the final output,
# where we don’t need the nonlinear part.
def full_layer(input, out_size):
    in_size = int(input.get_shape()[1])
    W = weight_initializer([in_size, out_size])
    b = bias_initializer([out_size])
    return tf.matmul(input, W) + b

## Model fro the book learning tensorflow - for CIFAR data

def conv_network(image_batch, batch_size):
    # Now create the model which returns the output neurons (eequals to the number of labels)
    # as a final fully connecetd layer output. Which we can use as input to the softmax classifier

    C1 , C2 , C3 = 30 , 50, 80      # Number of output features for each convolution layer
    F1 = 500                        # Number of output neuron for FC1 layer

    #Add original image to tensorboard summary

    add_filter_summary("Original" , image_batch)

    # First convolutaion layer with 5x5 filter size and 32 filters
    conv1 = conv_layer(image_batch, filter_shape=[3, 3, C, C1])
    pool1 = pool_layer(conv1, window_size=2)

    pool1 = tf.nn.dropout(pool1, keep_prob=conv_prob)

    add_filter_summary("conv1" , pool1)

    # Second convolutaion layer with 5x5 filter_size and 64 filters
    conv2 = conv_layer(pool1, filter_shape=[5, 5, C1, C2])
    pool2 = pool_layer(conv2, 2)
    pool2 = tf.nn.dropout(pool2, keep_prob=conv_prob)

    add_filter_summary("conv2" , pool2)

    # Third convolution layer 

    conv3 = conv_layer(pool2, filter_shape=[5, 5, C2, C3])

    # Since at this point the feature maps are of size 8×8 (following the first two poolings
    # that each reduced the 32×32 pictures by half on each axis).
    # This last pool layer pools each of the feature maps and keeps only the maximal value. 
    # The number of feature maps at the third block was set to 80, 
    # so at that point (following the max pooling) the representation is reduced to only 80 numbers


    pool3 = pool_layer(conv3, window_size = 8 , stride=8)
    pool3 = tf.nn.dropout(pool3, keep_prob=conv_prob)

    add_filter_summary("conv3" , pool3)

    # Reshape the output to feed to the FC layer
    flatterned_layer = tf.reshape(pool3, [batch_size,
                                          -1])  # -1 is to specify to use all the dimensions remaining in the input (other than batch_size).reshape(input , )

    fc1 = tf.nn.relu(full_layer(flatterned_layer, F1))

    full1_drop = tf.nn.dropout(fc1, keep_prob=fc_prob)

    # Fully connected layer 2 (output layer)
    final_Output = full_layer(full1_drop, 10)

    return final_Output, tf.summary.merge_all()

# Now that architecture is created , next step is to create the classification model
# (to predict the output class of the input data)
# Here we have used Logistic regression (Sigmoid function) to predict the output because we have only rwo class.
# For multiple class problem - softmax is the best prediction function


# Prepare the inputs to the input
Train_X , img_summary = conv_network(train_image_batch, train_batch_size)
Test_X , _ = conv_network(test_image_batch, test_batch_size)

# Generate 0 based index for labels
Train_Y = tf.to_int32(tf.argmax(
    tf.to_int32(tf.stack([tf.equal(train_label_batch, ["airplane"]), tf.equal(train_label_batch, ["automobile"]), 
                          tf.equal(train_label_batch, ["bird"]),tf.equal(train_label_batch, ["cat"]),
                          tf.equal(train_label_batch, ["deer"]),tf.equal(train_label_batch, ["dog"]),
                          tf.equal(train_label_batch, ["frog"]),tf.equal(train_label_batch, ["horse"]),
                          tf.equal(train_label_batch, ["ship"]), tf.equal(train_label_batch, ["truck"]) ])), 0))

Test_Y = tf.to_int32(tf.argmax(
        tf.to_int32(tf.stack([tf.equal(test_label_batch, ["airplane"]), tf.equal(test_label_batch, ["automobile"]), 
                          tf.equal(test_label_batch, ["bird"]),tf.equal(test_label_batch, ["cat"]),
                          tf.equal(test_label_batch, ["deer"]),tf.equal(test_label_batch, ["dog"]),
                          tf.equal(test_label_batch, ["frog"]),tf.equal(test_label_batch, ["horse"]),
                          tf.equal(test_label_batch, ["ship"]), tf.equal(test_label_batch, ["truck"]) ])), 0))


# Y =  tf.reshape(float_label_batch, X.get_shape())


# compute inference model over data X and return the result
# (using sigmoid function - as this function is the best to predict two class output)
# (For multiclass problem - Softmax is the bset prediction function)
def inference(X):
    return tf.nn.softmax(X)


# compute loss over training data X and expected outputs Y
# Cross entropy function is the best suited for loss calculation (Than the squared error function)

# Get the second column of the input to get only the features

def loss(X, Y):
    return tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=X, labels=Y))


# train / adjust model parameters according to computed total loss (using gradient descent)
def train(total_loss, learning_rate):
    return tf.train.AdamOptimizer(learning_rate).minimize(total_loss)


# evaluate the resulting trained model with dropout probability (Ideally 1.0 for testing)
def evaluate(sess, X, Y, dropout_prob):
    # predicted = tf.cast(inference(X) > 0.5 , tf.float32)

    #print("\nNetwork output:")
    #print(sess.run(inference(X) , feed_dict={conv_prob:1.0 , fc_prob:1.0}))

    # Inference contains the predicted probability of each class for each input image.
    # The class having higher probability is the prediction of the network. y_pred_cls = tf.argmax(y_pred, dimension=1)
    predicted = tf.cast(tf.argmax(X, 1), tf.int32)

    #print("\npredicted labels:")
    #print(sess.run(predicted , feed_dict={conv_prob:1.0 , fc_prob:1.0}))
    #print("\nTrue Labels:")
    #print(sess.run(Y , feed_dict={conv_prob:1.0 , fc_prob:1.0}))

    batch_accuracy = tf.reduce_mean(tf.cast(tf.equal(predicted, Y), tf.float32))

    # calculate the mean of the accuracies of the each batch (iteration)
    # No. of iteration Iteration should cover the (test_batch_size * num_of_iteration ) >= (2* num_of_test_samples ) condition
    total_accuracy = np.mean([sess.run(batch_accuracy, feed_dict={conv_prob:1.0 , fc_prob:1.0}) for i in range(250)])

    print("Accuracy of the model(in %): {:.4f} ".format(100 * total_accuracy))

# create a saver class to save the training checkpoints
saver = tf.train.Saver(max_to_keep=10)

# Create tensorboard sumamry for loss function
with tf.name_scope("summaries"):
    loss_summary = tf.summary.scalar("loss", loss(Train_X, Train_Y))

#merged = tf.summary.merge_all()

# Launch the graph in a session, setup boilerplate
with tf.Session() as sess:
    log_writer = tf.summary.FileWriter('./logs', sess.graph)

    total_loss = loss(Train_X, Train_Y)

    train_op = train(total_loss, 0.001)

    #Initialise all variables after defining all variables
    tf.global_variables_initializer().run()

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)

    print(sess.run(Train_Y))
    print(sess.run(Test_Y))

    evaluate(sess, Test_X, Test_Y,1.0)


    # actual training loop------------------------------------------------------
    training_steps = 50000
    print("\nStarting to train model with", str(training_steps), " steps...")
    to1 = dt.datetime.now()

    for step in range(1, training_steps + 1):

        # print(sess.run(train_label_batch))
        sess.run([train_op], feed_dict={fc_prob: 0.5 , conv_prob:0.8})  # Pass the dropout value for training batch to the placeholder

        # for debugging and learning purposes, see how the loss gets decremented thru training steps

        if step % 100 == 0:
            # print("\n")
            # print(sess.run(train_label_batch))
            loss_summaries, img_summaries , Tloss = sess.run([loss_summary, img_summary, total_loss],
                                      feed_dict={fc_prob: 0.5 , conv_prob:0.8})  # evaluate total loss to add it in summary object
            log_writer.add_summary(loss_summaries, step)  # add summary for each step
            log_writer.add_summary(img_summaries, step)
            print("Step:", step, " , loss: ", Tloss)

        if step%2000 == 0:
            saver.save(sess, "./Models/BookLT_CIFAR", global_step=step, latest_filename="model_chkpoint")
            print("\n")
            evaluate(sess, Test_X, Test_Y,1.0)

    saver.save(sess, "./Models/BookLT_CIFAR", global_step=step, latest_filename="model_chkpoint")
    to2 = dt.datetime.now()
    print("\nTotal Trainig time Elapsed: ", str(to2 - to1))


    # once the training is complete, evaluate the model with test (validation set)-------------------------------------------

    # Restore the model file and perform the testing
    #saver.restore(sess, "./Models/BookLT3_CIFAR-15000")

    print("\nPost Training....")

    # Performs Evaluation of model  on batches of test samples
    # In order to evaluate entire test set ,  number of iteration should be chosen such that ,
    # (test_batch_size * num_of_iteration ) >= (2* num_of_test_samples )

    evaluate(sess, Test_X, Test_Y,1.0)  # Evaluate multiple batch of test data set (randomly chosen by shuffle train batch queue)
    evaluate(sess, Test_X, Test_Y,1.0)
    evaluate(sess, Test_X, Test_Y,1.0)

    coord.request_stop()
    coord.join(threads)
    sess.close()
  • Here is the screenshot of my Pre training result: 是我的训练前结果的屏幕截图:

  • Here is the screenshot of the result during training: 是训练期间结果的屏幕截图:

  • Here is the screenshot of the Post training result 是岗位培训结果的屏幕截图

I did not run the code to verify that this is the only issue, but here is one important issue. 我没有运行代码来验证这是唯一的问题,但这是一个重要的问题。 When classifying, you should use one-hot encoding for your labels. 分类时,您应该对标签使用一键编码。 Meaning that if you have 3 classes, you want your labels to be [1, 0, 0] for class 1, [0, 1, 0] for class 2, [0, 0, 1] for class 3. Your approach of using 1, 2, and 3 as labels leads to various issues. 也就是说,如果你有3个班,你希望你的标签是[1, 0, 0]为1级, [0, 1, 0]为2级, [0, 0, 1]类3.您的方法使用1、2和3作为标签会导致各种问题。 For examples, the network is penalized more for predicting class 1 versus predicting class 2 for an image from class 3. TensorFlow functions like tf.nn.softmax_cross_entropy_with_logits work with such representations. 例如,对于类别1的图像,对于预测类别1的网络而不是预测类别2的网络,将受到更多的惩罚tf.nn.softmax_cross_entropy_with_logits函数(如tf.nn.softmax_cross_entropy_with_logits用于此类表示。

Here is the basic example of correctly using one_hot labels to compute loss: https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/examples/tutorials/mnist/mnist_softmax.py 这是正确使用one_hot标签来计算损失的基本示例: https : //github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/examples/tutorials/mnist/mnist_softmax.py

Here is how the one_hot label is constructed for mnist digits: https://github.com/tensorflow/tensorflow/blob/438604fc885208ee05f9eef2d0f2c630e1360a83/tensorflow/contrib/learn/python/learn/datasets/mnist.py#L69 这是为mnist位数构造one_hot标签的方式: https : //github.com/tensorflow/tensorflow/blob/438604fc885208ee05f9eef2d0f2c630e1360a83/tensorflow/contrib/learn/python/learn/datasets/mnist.py#L69

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM