Poor accuracy in Tensorflow CNN implementation

Question

I'm trying to implement a 5 layer deep convolutional neural network in Tensorflow with 3 convolutional layers followed by 2 fully connected layers. My current implementation is below.

def deepnn(x):

    x_image = tf.reshape(x, [-1, FLAGS.img_width, FLAGS.img_height, FLAGS.img_channels])
    img_summary = tf.summary.image('Input_images', x_image)

    with tf.variable_scope('Conv_1'):
        W_conv1 = weight_variable([5, 5, FLAGS.img_channels, 32])
        tf.add_to_collection('decay_weights',W_conv1)
        b_conv1 = bias_variable([32])
        h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1,2) + b_conv1)
        h_pool1 = avg_pool_3x3(h_conv1)

    with tf.variable_scope('Conv_2'):
        W_conv2 = weight_variable([5, 5, 32, 32])
        tf.add_to_collection('decay_weights',W_conv2)
        b_conv2 = bias_variable([32])
        h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2,2) + b_conv2)
        h_pool2 = avg_pool_3x3(h_conv2)

    with tf.variable_scope('Conv_3'):
        W_conv3 = weight_variable([5, 5, 32, 64])
        tf.add_to_collection('decay_weights',W_conv3)
        b_conv3 = bias_variable([64])
        h_conv3 = tf.nn.relu(conv2d(h_pool2, W_conv3,2) + b_conv3)
        h_pool3 = max_pool_3x3(h_conv3)

    with tf.variable_scope('FC_1'):
        h_pool3_flat = tf.reshape(h_pool3,[-1,4*4*64])
        W_fc1 = weight_variable([4*4*64,64])
        tf.add_to_collection('decay_weights',W_fc1)
        b_fc1 = bias_variable([64])
        h_fc1 = tf.nn.relu(tf.matmul(h_pool3_flat,W_fc1) + b_fc1)

    with tf.variable_scope('FC_2'):
        W_fc2 = weight_variable([64, FLAGS.num_classes])
        tf.add_to_collection('decay_weights',W_fc2)
        b_fc2 = bias_variable([FLAGS.num_classes])
        y_fc2 = tf.matmul(h_fc1, W_fc2) + b_fc2

    with tf.variable_scope('softmax'):
        y_conv = tf.nn.softmax(y_fc2)

    return y_conv, img_summary

def conv2d(x, W,p):
    output = tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='VALID', name='convolution')
    return tf.pad(output, tf.constant([[0,0],[p, p,],[p, p],[0,0]]), "CONSTANT")


def avg_pool_3x3(x):
    output = tf.nn.avg_pool(x, ksize=[1, 3, 3, 1],
                          strides=[1, 2, 2, 1], padding='VALID', name='pooling')
    return tf.pad(output, tf.constant([[0,0],[0, 1,],[0, 1],[0,0]]), "CONSTANT")

def max_pool_3x3(x):
    output = tf.nn.max_pool(x, ksize=[1, 3, 3, 1],
                          strides=[1, 2, 2, 1], padding='VALID', name='pooling2')
    return tf.pad(output, tf.constant([[0,0],[0, 1], [0, 1],[0,0]]), "CONSTANT")

def weight_variable(shape):
    weight_init = tf.random_uniform(shape, -0.05,0.05)
    return tf.Variable(weight_init, name='weights')

def bias_variable(shape):
    bias_init = tf.random_uniform(shape, -0.05,0.05)
    return tf.Variable(bias_init, name='biases')


def main(_):
    tf.reset_default_graph()

    dataset = pickle.load(open('dataset.pkl', 'rb'),encoding='latin1')
    train_dataset = dataset[0]

    learning_rate = 0.01
    current_validation_acc = 1

    with tf.variable_scope('inputs'):
        x = tf.placeholder(tf.float32, [None, FLAGS.img_width * FLAGS.img_height * FLAGS.img_channels])
        y_ = tf.placeholder(tf.float32, [None, FLAGS.num_classes])


    y_conv, img_summary = deepnn(x)


    with tf.variable_scope('softmax_loss'):
        softmax_loss = tf.reduce_mean(tf.negative(tf.log(tf.reduce_sum(tf.multiply(y_conv,y_),1))))

    tf.add_to_collection('losses', softmax_loss)
    loss = tf.add_n(tf.get_collection('losses'), name='total_loss')

    train_step = tf.train.MomentumOptimizer(learning_rate,FLAGS.momentum).minimize(loss)
    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name='accuracy')
    loss_summary = tf.summary.scalar('Loss', loss)
    acc_summary = tf.summary.scalar('Accuracy', accuracy)

For some unknown reason, the model doesn't seem to improve its accuracy above 10%. I've been banging my head against the wall trying to figure out why. I'm using a softmax loss cost function (as described here ) and momentum optimiser. The dataset used is the GTSRB dataset .

While I can add various deep learning features (such as adaptive learning rates etc) to improve the accuracy, I am suspicious as to why the basic CNN model is performing so poorly.

Is there anything obvious that could explain why it's not learning as expected? Alternatively, is there anything I could try to help diagnose the problem?

Any help would be much appreciated!

Answer 1

I'm using a softmax loss cost function and momentum optimiser.

I believe at least one of the problems is with the loss. This expression is not the cross-entropy loss:

# WRONG!
tf.reduce_mean(tf.negative(tf.log(tf.reduce_sum(tf.multiply(y_conv,y_),1)))

Take a look at the correct formula in this question . Anyway, you should simply use tf.nn.softmax_cross_entropy_with_logits (and drop the softmax from y_conv , as the loss function applies softmax itself).

PS. CNN architecture looks ok to me, should get to 60%-70% with right hyper-parameters.

Answer 2

There are a few points that should help:

As mentioned in another answer, the loss function is incorrect; use tf.nn.softmax_cross_entropy_with_logits .
It is good practice, particularly when getting started with deep learning / tensorflow, to start with a simpler model. You haven't told us how many classes you have, but let's assume you have 10 classes. Just about any simple model should do better than 10%, so this indicates something fundamentally wrong. The wrong thing is to elaborate your model further; the right thing is to simplify to logistic regression (which is just a single matrix multiply and then a softmax layer) and check performance. That way you can separate the network architecture from the optimization and loss function (partially anyway). Then build complexity from there.
Your data: you haven't described the data, and as much as we love the power of neural networks (we do!), understanding and thoughtfully preprocessing the data matters. For example, the famous SVHN dataset (google street view house numbers) is often found to be much easier to classify when some preprocessing is done on the color channels. If you read the fine print of many computer vision papers, there is similar data preprocessing. Perhaps this is not the case here, but simplifying your network to understand the data better (item above) should help.
Finally, this isn't likely causing your issue, but why are you using tf.pad as you are? You might find things easier to use padding=SAME instead of padding=VALID , making those tf.pad calls unnecessary.
After all that, use tensorboard to help analyze performance and how you might improve things. It's worth the trouble of learning it: https://www.tensorflow.org/get_started/summaries_and_tensorboard .

Answer 3

I think that your model is a little bit simple.
When I tried your model with more parameters like below, test accuracy was 86%.

W_conv2 = weight_variable([5, 5, 32, 64]) # feature maps 32=>64
b_conv2 = bias_variable([64])
W_conv3 = weight_variable([5, 5, 64, 128]) # feature maps 64=>128
b_conv3 = bias_variable([128])
W_fc1 = weight_variable([4*4*128,2048]) # feature maps 64=>2048
b_fc1 = bias_variable([2048])

This design of conv layers is inspired by VGG-16 network. In VGG-16 network, the number of feature maps are doubled through every stack of conv layers. The number of feature maps depend on task, but I think this design principle is useful for the traffic sign recognition task.

If you are interested in my experiment, please refer to my github repo. https://github.com/satojkovic/DeepTrafficSign/tree/sof_test

Answer 4

It's preferable to use :

with tf.variable_scope('Conv_1'):
        W_conv1 = weight_variable([3,3, FLAGS.img_channels, 32])
        W_conv1_2 = weight_variable([3,3, 32, 32])

rather than :

with tf.variable_scope('Conv_1'):
        W_conv1 = weight_variable([5, 5, FLAGS.img_channels, 32])

your network loses less finite information.

Prefer more orthodox parameters like

output = tf.nn.max_pool(input, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID', name=identifier)

over :

output = tf.nn.max_pool(x, ksize=[1, 3, 3, 1],
                          strides=[1, 2, 2, 1], padding='VALID', name='pooling2')

This will also prevent you from padding with a constant. Side Note : I think you should pad with something different from zero, i would assume that it adds noise... and last tip, i think your learning rate is way too high, start with something more like 1e-3,1e-4

Use AdamOptimizer, it works wonders... It basically has the second order of magnitude while viewing the error space which gives it an advantage over basic MomentumOptimizer.

Good Luck to you

Answer 5

You're assuming data_format = "NWHC" :

x_image = tf.reshape(x, [-1, FLAGS.img_width, FLAGS.img_height, FLAGS.img_channels])

but only "NHWC" (default) and "NCHW" are supported.

Poor accuracy in Tensorflow CNN implementation

Question

5 answers

solution1
1 2018-01-05 23:39:56

solution2
1 2018-01-06 11:25:03

solution3
0 2018-02-02 14:16:18

solution4
0 2018-02-02 14:45:48

solution5
0 2020-10-08 12:53:53

Poor accuracy in Tensorflow CNN implementation

Question

5 answers

solution1 1 2018-01-05 23:39:56

solution2 1 2018-01-06 11:25:03

solution3 0 2018-02-02 14:16:18

solution4 0 2018-02-02 14:45:48

solution5 0 2020-10-08 12:53:53

solution1
1 2018-01-05 23:39:56

solution2
1 2018-01-06 11:25:03

solution3
0 2018-02-02 14:16:18

solution4
0 2018-02-02 14:45:48

solution5
0 2020-10-08 12:53:53