简体   繁体   English

对于张量流中的二进制分类,成本函数总是返回零

[英]Cost function always returning zero for a binary classification in tensorflow

I have written the following binary classification program in tensorflow that is buggy. 我在tensorflow中编写了以下二进制分类程序,它是错误的。 The cost is returning to be zero all the time no matter what the input is. 无论输入是什么,成本都会一直返回到零。 I am trying to debug a larger program which is not learning anything from the data. 我正在尝试调试一个没有从数据中学到任何东西的大型程序。 I have narrowed down at least one bug to the cost function always returning zero. 我已经将至少一个bug缩小到成本函数,总是返回零。 The given program is using some random inputs and is having the same problem. 给定的程序使用一些随机输入并且具有相同的问题。 self.X_train and self.y_train is originally supposed to read from files and the function self.predict() has more layers forming a feedforward neural network. self.X_trainself.y_train最初应该从文件读取,函数self.predict()有更多的层形成前馈神经网络。

import numpy as np
import tensorflow as tf

class annClassifier():

    def __init__(self):

        with tf.variable_scope("Input"):
             self.X = tf.placeholder(tf.float32, shape=(100, 11))

        with tf.variable_scope("Output"):
            self.y = tf.placeholder(tf.float32, shape=(100, 1))

        self.X_train = np.random.rand(100, 11)
        self.y_train = np.random.randint(0,2, size=(100, 1))

    def predict(self):

        with tf.variable_scope('OutputLayer'):
            weights = tf.get_variable(name='weights',
                                      shape=[11, 1],
                                      initializer=tf.contrib.layers.xavier_initializer())
            bases = tf.get_variable(name='bases',
                                    shape=[1],
                                    initializer=tf.zeros_initializer())
            final_output = tf.matmul(self.X, weights) + bases

        return final_output

    def train(self):

        prediction = self.predict()
        cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=self.y))

        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())         
            print(sess.run(cost, feed_dict={self.X:self.X_train, self.y:self.y_train}))


with tf.Graph().as_default():
    classifier = annClassifier()
    classifier.train()

If someone could please figure out what I am doing wrong in this, I can try making the same change in my original program. 如果有人能够弄清楚我在做错了什么,我可以尝试在我的原始程序中进行相同的更改。 Thanks a lot! 非常感谢!

The only problem is invalid cost used. 唯一的问题是使用的无效成本。 softmax_cross_entropy_with_logits should be used if you have more than two classes, as softmax of a single output always returns 1, as it is defined as : softmax_cross_entropy_with_logits如果你有两个以上的类,SOFTMAX单个输出的总是返回1,因为它被定义为应使用:

softmax(x)_i = exp(x_i) / SUM_j exp(x_j)

so for a single number (one dimensional output) 所以对于单个数字(一维输出)

softmax(x) = exp(x) / exp(x) = 1

Furthermore, for softmax output TF expects one-hot encoded labels, so if you provide only 0 or 1, there are two possibilities: 此外,对于softmax输出,TF需要单热编码标签,因此如果您只提供0或1,则有两种可能:

  1. True label is 0, so the cost is -0*log(1) = 0 真标签为0,因此成本为-0*log(1) = 0
  2. True label is 1, so the cost is -1*log(1) = 0 真实标签为1,因此成本为-1*log(1) = 0

Tensorflow has a separate function to handle binary classification which applies sigmoid instead (note, that the same function for more than one output would apply sigmoid independently on each dimension which is what multi-label classification would expect): Tensorflow有一个单独的函数来处理二进制分类,它适用于sigmoid(注意,多个输出的相同函数将在每个维度上独立应用sigmoid,这是多标签分类所期望的):

tf.sigmoid_cross_entropy_with_logits

just switch to this cost and you are good to go, you do not have to encode anything as one-hot anymore either, as this function is designed solely to be used for your use-case. 只需切换到这个成本,你就可以了,你也不必将任何东西编码为单热,因为这个功能仅用于你的用例。

The only missing bit is that .... your code does not have actual training routine you need to define optimiser, ask it to minimise a loss and then run a train op in the loop. 唯一遗漏的是......你的代码没有你需要定义优化器的实际训练例程 ,要求它最小化损失,然后在循环中运行一个训练操作。 In your current setting you just try to predict over and over, with the network which never changes. 在您当前的设置中,您只是尝试一遍又一遍地预测,网络永远不会改变。

In particular, please refer to Cross Entropy Jungle question on SO which provides more detailed description of all these different helper functions in TF (and other libraries), which have different requirements/use cases. 特别是请参考关于SO的Cross Entropy Jungle问题,它提供了TF(和其他库)中具有不同要求/用例的所有这些不同辅助函数的更详细描述。

The softmax_cross_entropy_with_logits is basically a stable implementation of the 2 parts : softmax_cross_entropy_with_logits基本上是两部分的稳定实现:

softmax = tf.nn.softmax(prediction)
cost = -tf.reduce_mean(labels * tf.log(softmax), 1)

Now in your example, prediction is a single value, so when you apply softmax on it, its going to be always 1 irrespective of the value (exp(prediction)/exp(prediction) = 1) , and so the tf.log(softmax) term becomes 0. Thats why you always get your cost zero. 现在在您的示例中,预测是单个值,因此当您对其应用softmax时,无论值(exp(prediction)/exp(prediction) = 1) ,它都将始终(exp(prediction)/exp(prediction) = 1) ,因此tf.log(softmax)术语变为0.这就是为什么你总是让你的成本为零。

Either apply sigmoid to get your probabilities between 0 or 1 or if you use want to use softmax get the labels as [1, 0] for class 0 and [0, 1] for class 1. 无论是应用sigmoid获得之间的0或1,或者如果您使用想用你的概率softmax得到标签为[1, 0]为0类和[0, 1] 1类。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM