Tensorflow Cifar10教程示例失败了

Question

I am currently trying to teach myself TensorFlow. 我目前正在尝试自学TensorFlow。 After thorough reading and videos, I tried to re-create to example provided at https://www.tensorflow.org/versions/r0.12/tutorials/mnist/beginners/index.html#mnist-for-ml-beginners However, to not only copy&paste, I decided to make small alterations, to actually see whether I understand what I am doing, thus I decided to work with the CIFAR-10 dataset (small 32x32 rgb images). 在仔细阅读和观看视频后，我尝试重新创建https://www.tensorflow.org/versions/r0.12/tutorials/mnist/beginners/index.html#mnist-for-ml-beginners提供的示例。不仅是复制和粘贴，我还决定进行一些改动，以实际了解我是否了解自己在做什么，因此，我决定使用CIFAR-10数据集（32x32 rgb小型图像）。

The code skeleton is pretty much the basic skeleton, like it is presented in the tutorial: 代码框架几乎是基本框架，如本教程中所示：

# Imports
import tensorflow as tf
import numpy as np

###
### Open data files (dict)
###

def unpickle(file):
    import cPickle
    fo = open(file, 'rb')
    dict = cPickle.load(fo)
    fo.close()
    return dict

cifar10_test = unpickle('cifar-10-batches-py/test_batch')
cifar10_meta = unpickle('cifar-10-batches-py/batches.meta')
cifar10_batches = [unpickle('cifar-10-batches-py/data_batch_1'),
    unpickle('cifar-10-batches-py/data_batch_2'),
    unpickle('cifar-10-batches-py/data_batch_3'),
    unpickle('cifar-10-batches-py/data_batch_4'),
    unpickle('cifar-10-batches-py/data_batch_5')]

###
### Tensorflow Model
###
x = tf.placeholder("float", [None, 3072]) 
W = tf.Variable(tf.zeros([3072,10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x,W) + b)
y_ = tf.placeholder("float", [None,10])

cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

###
### Model training
###

for batch in cifar10_batches:
    # Convert labels to vector with zeros, but 1 at correct position
    batch['labels_vec'] = np.zeros((10000,10), dtype=float, order='C')
    for i in range(10000):
        batch['labels_vec'][i][batch['labels'][i]] = 1

    # Train in smaller sub-batches
    for i in range(3): # Breaks at first iteration, so no need to go on further
        start = i*100
        stop = start+100
        [_, cross_entropy_py] = sess.run([train_step, cross_entropy],
            feed_dict={x: batch['data'][start:stop],
            y_: batch['labels_vec'][start:stop]})
        print 'loss = %s' % cross_entropy_py
    break # Only first batch for now

This leaves me with the output: 这给我留下了输出：

loss = 230.259
loss = nan
loss = nan

No error is provided by the console. 控制台未提供任何错误。 I tried searching for people with the same problem, however only found different questions of scenarios which resulted in "nan" values. 我尝试搜索存在相同问题的人员，但是只发现了导致“难”值的方案的不同问题。

The only things I changed from the online tutorial: The originally used dataset had handwritten numbers with 28x28 pixels on greyscale, thus only 784 values, instead of 3072. However, I believe this should not fundamentally change much, as I also changed the dimensions of the placeholders. 从在线教程中，我唯一改变的是：最初使用的数据集的手写数字灰度值为28x28像素，因此只有784个值，而不是3072个值。但是，我相信这不会从根本上改变，因为我还更改了占位符。

Additionally, my label values were given as a list of numbers between 0 and 9. I changed this to be zero vectors, where the correct position is indicated with a 1. eg if it was 3, it would be replaced with [0 0 0 1 0 0 0 0 0 0] 另外，我的标签值是0到9之间的数字列表。我将其更改为零向量，正确的位置用1表示。例如，如果它是3，则将其替换为[0 0 0 1 0 0 0 0 0 0]

Some hints on where I should aim my debugging would be helpful. 关于调试目标的一些提示会有所帮助。 I had a bigger stepsize of 0.1 for the GradientDescentOptimizer, but reduced that to 0.01 (the original value as used in the tutorial) after reading that a too big stepsize may result in the loss diverging to nan. 我为GradientDescentOptimizer设置了较大的0.1步长，但在阅读到太大的步长可能会导致损失分散到nan后，将其减小到0.01（本教程中使用的原始值）。

Thank you in advance. 先感谢您。

Answer 1

Your loss is not numerically stable. 您的损失在数值上不稳定。 You can use a loss that is already implemented for multiclass logistic regression instead of your loss: sigmoid_cross_entropy_with_logits . 您可以使用已经为多类逻辑回归实现的损失，而不是损失： sigmoid_cross_entropy_with_logits 。 It was carefully designed to avoid numerical problems. 它经过精心设计，以避免出现数值问题。

Tensorflow Cifar10教程示例失败了

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-12-08 15:51:45

Tensorflow Cifar10教程示例失败了

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-12-08 15:51:45

解决方案1
1 已采纳 2016-12-08 15:51:45