What is going wrong with the training and predictions using TensorFlow?

Question

Please see the code written below.

x = tf.placeholder("float", [None, 80])
W = tf.Variable(tf.zeros([80,2]))
b = tf.Variable(tf.zeros([2]))

y = tf.nn.softmax(tf.matmul(x,W) + b)

y_ = tf.placeholder("float", [None,2])

So here we see that there are 80 features in the data with only 2 possible outputs. I set the cross_entropy and the train_step like so.

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(tf.matmul(x, W) + b, y_)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

Initialize all variables.

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

Then I use this code to "train" my Neural Network.

g = 0
for i in range(len(x_train)):

    _, w_out, b_out = sess.run([train_step, W, b], feed_dict={x: [x_train[g]], y_: [y_train[g]]})

    g += 1

print "...Trained..."

After training the network, it always produces the same accuracy rate regardless of how many times I train it. That accuracy rate is 0.856067 and I get to that accuracy with this code-

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print sess.run(accuracy, feed_dict={x: x_test, y_: y_test})
0.856067

So this is where the question comes in. Is it because I have too small of dimensions? Maybe I should break the features into a 10x8 matrix? Maybe a 4x20 matrix? etc.

Then I try to get the probabilities of the actual test data producing a 0 or a 1 like so-

test_data_actual = genfromtxt('clean-test-actual.csv',delimiter=',')  # Actual Test data

x_test_actual = []
for i in test_data_actual:
    x_test_actual.append(i)
x_test_actual = np.array(x_test_actual)

ans = sess.run(y, feed_dict={x: x_test_actual})

And print out the probabilities:

print ans[0:10]
[[ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]
 [ 1.  0.]]

(Note: it does produce [ 0. 1.] sometimes.)

I then tried to see if applying the expert methodology would produce better results. Please see the following code.

def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 1, 1, 1],
                        strides=[1, 1, 1, 1], padding='SAME')

(Please note how I changed the strides in order to avoid errors).

W_conv1 = weight_variable([1, 80, 1, 1])
b_conv1 = bias_variable([1])

Here is where the question comes in again. I define the Tensor (vector/matrix if you will) as 80x1 (so 1 row with 80 features in it); I continue to do that throughout the rest of the code (please see below).

x_ = tf.reshape(x, [-1,1,80,1])
h_conv1 = tf.nn.relu(conv2d(x_, W_conv1) + b_conv1)

Second Convolutional Layer

h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([1, 80, 1, 1])
b_conv2 = bias_variable([1])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

Densely Connected Layer

W_fc1 = weight_variable([80, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 80])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

Dropout

keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

Readout

W_fc2 = weight_variable([1024, 2])
b_fc2 = bias_variable([2])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

In the above you'll see that I defined the output as 2 possible answers (also to avoid errors).

Then cross_entropy and the train_step .

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(tf.matmul(h_fc1_drop, W_fc2) + b_fc2, y_)

train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

Start the session.

sess.run(tf.initialize_all_variables())

"Train" the neural network.

g = 0

for i in range(len(x_train)):
    if i%100 == 0:
        train_accuracy = accuracy.eval(session=sess, feed_dict={x: [x_train[g]], y_: [y_train[g]], keep_prob: 1.0})

    train_step.run(session=sess, feed_dict={x: [x_train[g]], y_: [y_train[g]], keep_prob: 0.5})

    g += 1

print "test accuracy %g"%accuracy.eval(session=sess, feed_dict={
    x: x_test, y_: y_test, keep_prob: 1.0})
test accuracy 0.929267

And, once again, it always produces 0.929267 as the output.

The probabilities on the actual data producing a 0 or a 1 are as follows:

[[ 0.92820859  0.07179145]
 [ 0.92820859  0.07179145]
 [ 0.92820859  0.07179145]
 [ 0.92820859  0.07179145]
 [ 0.92820859  0.07179145]
 [ 0.92820859  0.07179145]
 [ 0.96712834  0.03287172]
 [ 0.92820859  0.07179145]
 [ 0.92820859  0.07179145]
 [ 0.92820859  0.07179145]]

As you see, there is some variance in these probabilities, but typically just the same result.

I know that this isn't a Deep Learning problem. This is obviously a training problem. I know that there should always be some variance in the training accuracy every time you reinitialize the variables and retrain the network, but I just don't know why or where it's going wrong.

Answer 1

The answer is 2 fold.

One problem is with the dimensions/parameters. The other problem is that the features are being placed in the wrong spot.

W_conv1 = weight_variable([1, 2, 1, 80])
b_conv1 = bias_variable([80])

Notice the first two numbers in the weight_variable correspond to the dimensions of the input. The second two numbers correspond to the dimensions of the feature tensor. The bias_variable always takes the final number in the weight_variable .

Second Convolutional Layer

W_conv2 = weight_variable([1, 2, 80, 160])
b_conv2 = bias_variable([160])

Here the first two numbers still correspond to the dimensions of the input. The second two numbers correspond to the amount of features and the weighted network that results from the 80 previous features. In this case, we double the weighted network. 80x2=160. The bias_variable then takes the final number in the weight_variable . If you were to finish the code at this point, the last number in the weight_variable would be a 1 in order to prevent dimensional errors due to the shape of the input tensor and the output tensor. But, instead, for better predictions, let's add a third convolutional layer.

Third Convolutional Layer

W_conv3 = weight_variable([1, 2, 160, 1])
b_conv3 = bias_variable([1])

Once again, the first two numbers in the weight_variable take the shape of the input. The third number corresponds to the amount of the weighted variables we established in the Second Convolutional Layer. The last number in the weight_variable now becomes 1 so we don't run into any dimension errors on the output that we are predicting. In this case, the output has the dimensions of 1, 2 .

W_fc2 = weight_variable([80, 1024])
b_fc2 = bias_variable([1024])

Here, the number of neurons is 1024 which is completely arbitrary, but the first number in the weight_variable needs to be something that the dimensions of our feature matrix needs to be divisible by. In this case it can be any number (such as 2, 4, 10, 20, 40, 80 ). Once again, the bias_variable takes the last number in the weight_variable .

At this point, make sure that the last number in h_pool3_flat = tf.reshape(h_pool3, [-1, 80]) corresponds to the first number in the W_fc2 weight_variable .

Now when you run your training program you will notice that the outcome varies and won't always guess all 1's or all 0's.

When you want to predict the probabilities, you have to feed x to the softmax variable-> y_conv=tf.nn.softmax(tf.matmul(h_fc2_drop, W_fc3) + b_fc3) like so-

ans = sess.run(y_conv, feed_dict={x: x_test_actual, keep_prob: 1.0})

You can alter the keep_prob variable, but keeping it at a 1.0 always produces the best results. Now, if you print out ans you'll have something that looks like this-

[[ 0.90855026  0.09144982]
 [ 0.93020624  0.06979381]
 [ 0.98385173  0.0161483 ]
 [ 0.93948185  0.06051811]
 [ 0.90705943  0.09294061]
 [ 0.95702559  0.04297439]
 [ 0.95543593  0.04456403]
 [ 0.95944828  0.0405517 ]
 [ 0.99154049  0.00845954]
 [ 0.84375167  0.1562483 ]
 [ 0.98449463  0.01550537]
 [ 0.97772813  0.02227189]
 [ 0.98341942  0.01658053]
 [ 0.93026513  0.06973486]
 [ 0.93376994  0.06623009]
 [ 0.98026556  0.01973441]
 [ 0.93210858  0.06789146]

Notice how the probabilities vary. Your training is now working properly.

What is going wrong with the training and predictions using TensorFlow?

Question

1 answers

solution1
1 ACCPTED 2015-12-21 16:47:41

What is going wrong with the training and predictions using TensorFlow?

Question

1 answers

solution1 1 ACCPTED 2015-12-21 16:47:41

solution1
1 ACCPTED 2015-12-21 16:47:41