Please see the code written below.
x = tf.placeholder("float", [None, 80])
W = tf.Variable(tf.zeros([80,2]))
b = tf.Variable(tf.zeros([2]))
y = tf.nn.softmax(tf.matmul(x,W) + b)
y_ = tf.placeholder("float", [None,2])
So here we see that there are 80 features in the data with only 2 possible outputs. I set the cross_entropy
and the train_step
like so.
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(tf.matmul(x, W) + b, y_)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
Initialize all variables.
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
Then I use this code to "train" my Neural Network.
g = 0
for i in range(len(x_train)):
_, w_out, b_out = sess.run([train_step, W, b], feed_dict={x: [x_train[g]], y_: [y_train[g]]})
g += 1
print "...Trained..."
After training the network, it always produces the same accuracy rate regardless of how many times I train it. That accuracy rate is 0.856067
and I get to that accuracy with this code-
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print sess.run(accuracy, feed_dict={x: x_test, y_: y_test})
0.856067
So this is where the question comes in. Is it because I have too small of dimensions? Maybe I should break the features into a 10x8 matrix? Maybe a 4x20 matrix? etc.
Then I try to get the probabilities of the actual test data producing a 0 or a 1 like so-
test_data_actual = genfromtxt('clean-test-actual.csv',delimiter=',') # Actual Test data
x_test_actual = []
for i in test_data_actual:
x_test_actual.append(i)
x_test_actual = np.array(x_test_actual)
ans = sess.run(y, feed_dict={x: x_test_actual})
And print out the probabilities:
print ans[0:10]
[[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]
[ 1. 0.]]
(Note: it does produce [ 0. 1.]
sometimes.)
I then tried to see if applying the expert methodology would produce better results. Please see the following code.
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 1, 1, 1],
strides=[1, 1, 1, 1], padding='SAME')
(Please note how I changed the strides
in order to avoid errors).
W_conv1 = weight_variable([1, 80, 1, 1])
b_conv1 = bias_variable([1])
Here is where the question comes in again. I define the Tensor (vector/matrix if you will) as 80x1 (so 1 row with 80 features in it); I continue to do that throughout the rest of the code (please see below).
x_ = tf.reshape(x, [-1,1,80,1])
h_conv1 = tf.nn.relu(conv2d(x_, W_conv1) + b_conv1)
Second Convolutional Layer
h_pool1 = max_pool_2x2(h_conv1)
W_conv2 = weight_variable([1, 80, 1, 1])
b_conv2 = bias_variable([1])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
Densely Connected Layer
W_fc1 = weight_variable([80, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 80])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
Dropout
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
Readout
W_fc2 = weight_variable([1024, 2])
b_fc2 = bias_variable([2])
y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
In the above you'll see that I defined the output as 2 possible answers (also to avoid errors).
Then cross_entropy
and the train_step
.
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(tf.matmul(h_fc1_drop, W_fc2) + b_fc2, y_)
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
Start the session.
sess.run(tf.initialize_all_variables())
"Train" the neural network.
g = 0
for i in range(len(x_train)):
if i%100 == 0:
train_accuracy = accuracy.eval(session=sess, feed_dict={x: [x_train[g]], y_: [y_train[g]], keep_prob: 1.0})
train_step.run(session=sess, feed_dict={x: [x_train[g]], y_: [y_train[g]], keep_prob: 0.5})
g += 1
print "test accuracy %g"%accuracy.eval(session=sess, feed_dict={
x: x_test, y_: y_test, keep_prob: 1.0})
test accuracy 0.929267
And, once again, it always produces 0.929267
as the output.
The probabilities on the actual data producing a 0 or a 1 are as follows:
[[ 0.92820859 0.07179145]
[ 0.92820859 0.07179145]
[ 0.92820859 0.07179145]
[ 0.92820859 0.07179145]
[ 0.92820859 0.07179145]
[ 0.92820859 0.07179145]
[ 0.96712834 0.03287172]
[ 0.92820859 0.07179145]
[ 0.92820859 0.07179145]
[ 0.92820859 0.07179145]]
As you see, there is some variance in these probabilities, but typically just the same result.
I know that this isn't a Deep Learning problem. This is obviously a training problem. I know that there should always be some variance in the training accuracy every time you reinitialize the variables and retrain the network, but I just don't know why or where it's going wrong.
The answer is 2 fold.
One problem is with the dimensions/parameters. The other problem is that the features are being placed in the wrong spot.
W_conv1 = weight_variable([1, 2, 1, 80])
b_conv1 = bias_variable([80])
Notice the first two numbers in the weight_variable
correspond to the dimensions of the input. The second two numbers correspond to the dimensions of the feature tensor. The bias_variable
always takes the final number in the weight_variable
.
Second Convolutional Layer
W_conv2 = weight_variable([1, 2, 80, 160])
b_conv2 = bias_variable([160])
Here the first two numbers still correspond to the dimensions of the input. The second two numbers correspond to the amount of features and the weighted network that results from the 80 previous features. In this case, we double the weighted network. 80x2=160. The bias_variable
then takes the final number in the weight_variable
. If you were to finish the code at this point, the last number in the weight_variable
would be a 1 in order to prevent dimensional errors due to the shape of the input tensor and the output tensor. But, instead, for better predictions, let's add a third convolutional layer.
Third Convolutional Layer
W_conv3 = weight_variable([1, 2, 160, 1])
b_conv3 = bias_variable([1])
Once again, the first two numbers in the weight_variable
take the shape of the input. The third number corresponds to the amount of the weighted variables we established in the Second Convolutional Layer. The last number in the weight_variable
now becomes 1 so we don't run into any dimension errors on the output that we are predicting. In this case, the output has the dimensions of 1, 2
.
W_fc2 = weight_variable([80, 1024])
b_fc2 = bias_variable([1024])
Here, the number of neurons is 1024
which is completely arbitrary, but the first number in the weight_variable
needs to be something that the dimensions of our feature matrix needs to be divisible by. In this case it can be any number (such as 2, 4, 10, 20, 40, 80
). Once again, the bias_variable
takes the last number in the weight_variable
.
At this point, make sure that the last number in h_pool3_flat = tf.reshape(h_pool3, [-1, 80])
corresponds to the first number in the W_fc2
weight_variable
.
Now when you run your training program you will notice that the outcome varies and won't always guess all 1's or all 0's.
When you want to predict the probabilities, you have to feed x
to the softmax
variable-> y_conv=tf.nn.softmax(tf.matmul(h_fc2_drop, W_fc3) + b_fc3)
like so-
ans = sess.run(y_conv, feed_dict={x: x_test_actual, keep_prob: 1.0})
You can alter the keep_prob
variable, but keeping it at a 1.0
always produces the best results. Now, if you print out ans
you'll have something that looks like this-
[[ 0.90855026 0.09144982]
[ 0.93020624 0.06979381]
[ 0.98385173 0.0161483 ]
[ 0.93948185 0.06051811]
[ 0.90705943 0.09294061]
[ 0.95702559 0.04297439]
[ 0.95543593 0.04456403]
[ 0.95944828 0.0405517 ]
[ 0.99154049 0.00845954]
[ 0.84375167 0.1562483 ]
[ 0.98449463 0.01550537]
[ 0.97772813 0.02227189]
[ 0.98341942 0.01658053]
[ 0.93026513 0.06973486]
[ 0.93376994 0.06623009]
[ 0.98026556 0.01973441]
[ 0.93210858 0.06789146]
Notice how the probabilities vary. Your training is now working properly.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.