简体   繁体   English

错误培训Cifar-10模型Tensorflow-精度为0,将不会优化并且不会报告损失

[英]Error training Cifar-10 Model Tensorflow - Accuracy is 0 and will not optimize and losses not reported

I am currently trying to train my model to categorize the cifar-10 dataset. 我目前正在尝试训练我的模型以对cifar-10数据集进行分类。 I read the data like this: 我这样读取数据:

def convert_images(raw):
raw_float = np.array(raw, dtype = float)
images = raw_float.reshape([-1,3,32,32])
images = images.transpose([0,2,3,1])
return images

def load_data(filename):
data = unpickle(filename)
raw_images = data[b'data']
labels = np.array(data[b'labels'])
images = convert_images(raw_images)
return images, labels

def load_training_data():
images = np.zeros(shape=[50000,32,32,3], dtype = float)
labels = np.zeros(shape = [50000], dtype = int)
begin = 0 
for i in range(5):
    filename = "data_batch_" + str(i+1)
    images_batch, labels_batch = load_data(filename)
    num_images = len(images_batch)
    end = begin + num_images
    images[begin:end, :] = images_batch
    labels[begin:end] = labels_batch
    begin = end
    return images, labels, OneHotEncoder(categorical_features=labels, n_values=10)

What this does is reshape the data so that it is a 4d array with 32x32x3 values for the pixels and rgb colors. 这是对数据进行整形,使其成为像素和rgb颜色为32x32x3值的4d数组。 I define my model like this (i first reshape X to be a row vector because the 4d array creates errors): 我这样定义我的模型(我首先将X整形为行向量,因为4d数组会产生错误):

X = tf.placeholder(tf.float32, [None,32,32,3])
Y_labeled = tf.placeholder(tf.int32, [None])
data = load_training_data()

with tf.name_scope('dnn'):
    XX = tf.reshape(X, [-1,3072])
    hidden1 = tf.layers.dense(XX, 300, name = 'hidden1', activation = tf.nn.relu)
    hidden2 = tf.layers.dense(hidden1, 200, name = 'hidden2', activation = tf.nn.relu)
    hidden3 = tf.layers.dense(hidden2, 200, name = 'hidden3', activation = tf.nn.relu)
    hidden4 = tf.layers.dense(hidden3, 100, name = 'hidden4', activation = tf.nn.relu)
    logits = tf.layers.dense(hidden4, 10, name = 'outputs')

with tf.name_scope('loss'):
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels = (Y_labeled), logits = logits)
    loss = tf.reduce_mean(cross_entropy, name = 'loss')

learning_rate = 0.01

with tf.name_scope('train'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

with tf.name_scope('eval'):
    correct = tf.nn.in_top_k(logits,Y_labeled, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

init = tf.global_variables_initializer()

batch_size = 100
n_epochs = 50

with tf.Session() as sess:
init.run()
for epoch in range(n_epochs):
    for iteration in range(50000 // batch_size):
        X_batch = data[0][iteration*batch_size:(iteration+1)*batch_size]
        y_batch = data[1][iteration*batch_size:(iteration+1)*batch_size]
        #X_batch, y_batch = data.train.next_batch(batch_size)
        sess.run(training_op, feed_dict = {X: X_batch,Y_labeled: y_batch})
    acc_train = accuracy.eval(feed_dict = {X: X_batch,Y_labeled: y_batch})
    print(epoch, "train accuracy:", acc_train, "loss", loss)

I want to define a simple model that has 4 hidden layers. 我想定义一个具有4个隐藏层的简单模型。 When I run this it compiles with no errors and starts "training", but the accuracy is 0.0 and it does not print any losses. 当我运行此程序时,它将没有错误地进行编译并开始“训练”,但是精度为0.0,并且不会产生任何损失。 I am not sure if the error is in my calculation of accuracy and loss or in my definition of the model. 我不确定错误是在计算精度和损失时还是在模型定义中。

There seem to be a problem with the way you feed your labels. 送入标签的方式似乎有问题。 When you create the placholder Y_labeled = tf.placeholder(tf.int32, [None, 10]) it seems to be a vector of dimension 10 but later when you create the label numpy tensor labels = np.zeros(shape = [50000], dtype = int) it seems to be a scalar. 当您创建placholder Y_labeled = tf.placeholder(tf.int32, [None, 10])它似乎是尺寸为10的矢量,但是稍后创建标签numpy张量时, labels = np.zeros(shape = [50000], dtype = int)似乎是一个标量。

This is why you have this error, the placeholder needs to be fed with a tensor of dimension (batch_size, 10) but you feed it with (batch_size, 0) 这就是为什么会出现此错误的原因,需要向占位符提供尺寸为(batch_size, 10)的张量,但向其提供(batch_size, 0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM