简体   繁体   English

激活函数在计算人工神经网络成本函数中的作用

[英]Role of activation function in calculating the cost function for artificial neural networks

I have some difficulty with understanding the role of activation functions and cost functions.我在理解激活函数和成本函数的作用方面有一些困难。 Lets take a look at a simple example.让我们看一个简单的例子。 Lets say I am building a neural network (artificial neural network).假设我正在构建一个神经网络(人工神经网络)。 I have 5 „x“ variables and one „y“ variable.我有 5 个“x”变量和一个“y”变量。

If I do usual feature scaling and then apply, for example, Relu activation function in hidden layer, then this activation function does the transformation and as a result we get our predicted output value (y hat) between 0 and lets say M. Then the next step is to calculate the cost function.如果我做通常的特征缩放然后应用,例如,隐藏层中的 Relu 激活函数,那么这个激活函数会进行转换,结果我们得到我们的预测输出值(y hat)在 0 之间,假设为 M。那么下一步是计算成本函数。

In calculating the cost function, however, we need to compare the output value (y hat) with the actual value (y).然而,在计算成本函数时,我们需要将输出值(y hat)与实际值(y)进行比较。

The question is how we can compare transformed output value (y hat) which is lets say between 0 and M with the untransformed actual value (y) (which can be any number as it is not been subjected to the Relu activation function) to calculate the cost function?问题是我们如何将转换后的输出值(y hat)(可以说在 0 和 M 之间)与未转换的实际值(y)(可以是任何数字,因为它没有受到 Relu 激活函数的影响)来计算成本函数? There can be a large mismatch as one variable has been exposed to transformation and the other has not been.由于一个变量已暴露于转换而另一个未暴露,因此可能存在较大的不匹配。

Thank you for any help.感谢您的任何帮助。

It sounds like you are performing a regression task since you describe your final output as, "the untransformed actual value (y) (which can be any number as it is not been subjected to the Relu activation function)."听起来您正在执行回归任务,因为您将最终输出描述为“未转换的实际值 (y)(它可以是任何数字,因为它不受 Relu 激活函数的影响)。”

In that case, you will not use an activation function on your final output layer of the neural network, because, just as you point out, the prediction is not intended to be constrained to any particular activated region of the real numbers... it is allowed to be any real number (and the model will use the gradient of the loss function to adjust parameters in earlier layers of the network to achieve accuracy in that creation of some "any number" final output value).在这种情况下,您将不会在神经网络的最终输出层上使用激活函数,因为正如您指出的那样,预测并不旨在限制于实数的任何特定激活区域......它允许是任何实数(并且模型将使用损失函数的梯度来调整网络早期层中的参数,以在创建某些“任意数字”最终输出值时实现准确性)。

For an example, see the Basic Regression TensorFlow Keras tutorial.有关示例,请参阅基本回归TensorFlow Keras 教程。 You can see from the model layer definitions:从模型层定义可以看出:

def build_model():
  model = keras.Sequential([
    layers.Dense(64, activation=tf.nn.relu, input_shape=[len(train_dataset.keys())]),
    layers.Dense(64, activation=tf.nn.relu),
    layers.Dense(1)
  ])

  optimizer = tf.train.RMSPropOptimizer(0.001)

  model.compile(loss='mse',
                optimizer=optimizer,
                metrics=['mae', 'mse'])
  return model

It is using a mean-squared error loss, and the final layer is just a plain Dense(1) value, with no activation.它使用均方误差损失,最后一层只是一个普通的Dense(1)值,没有激活。

In cases when the output is a binary classification or multi-label classification prediction, then you will still apply an activation to the final layer, and it will transform the value into a relative score that indicates the model's prediction about each category.在输出是二元分类或多标签分类预测的情况下,您仍将对最后一层应用激活,并将值转换为相对分数,指示模型对每个类别的预测。

So for example if you wanted to predict a label for a 4-category prediction task, your output layer would be something like Dense(4, activation=tf.nn.softmax) , where the softmax activation converts the raw neuron values of those 4 neurons into relative scores.因此,例如,如果您想预测 4 类预测任务的标签,您的输出层将类似于Dense(4, activation=tf.nn.softmax) ,其中 softmax 激活转换这 4 个的原始神经元值神经元转化为相对分数。

It's typical to associate the highest scoring output neuron in that case with the predicted category label.在这种情况下,通常将得分最高的输出神经元与预测的类别标签相关联。 However, categorical loss functions, like cross entropy loss, will utilize the relative values of the scores for all neurons as a way to dole out loss in accordance with the degree of an accurate prediction, rather than a 0-1 loss which would give maximum loss for any incorrect prediction, regardless of how close or far it was from being correct.然而,分类损失函数,如交叉熵损失,将利用所有神经元分数的相对值作为一种根据准确预测程度分配损失的方法,而不是 0-1 损失,这将给出最大任何不正确预测的损失,无论它离正确有多近或多远。

-A cost function is a measure of error between what value your model predicts and what the value actually is. - 成本函数是对模型预测值与实际值之间误差的度量。 For example, say we wish to predict the value yi for data point xi .例如,假设我们希望预测数据点 xi 的值 yi 。 Let fθ(xi) represent the prediction or output of some arbitrary model for the point xi with parameters θ .令 fθ(xi) 表示对参数为 θ 的点 xi 的某个任意模型的预测或输出。 One of many cost functions could be许多成本函数之一可以是

∑ni=1(yi−fθ(xi))2 ∑ni=1(yi−fθ(xi))2

this function is known as the L2 loss.这个函数被称为 L2 损失。 Training the hypothetical model we stated above would be the process of finding the θ that minimizes this sum.训练我们上面提到的假设模型将是找到最小化这个总和的 θ 的过程。

-An activation function transforms the shape/representation of the data going into it. - 激活函数转换进入其中的数据的形状/表示。 A simple example could be max(0,xi) , a function which outputs 0 if the input xi is negative or xi if the input xi is positive.一个简单的例子可能是 max(0,xi) ,如果输入 xi 为负则输出 0 或如果输入 xi 为正则输出 xi 的函数。 This function is known as the “ReLU” or “Rectified Linear Unit” activation function.此函数称为“ReLU”或“整流线性单元”激活函数。 The choice of which function(s) are best for a specific problem using a particular neural architecture is still under a lot of discussion.对于使用特定神经架构的特定问题,选择哪个函数是最佳的仍在讨论中。 However, these representations are essential for making high-dimensional data linearly separable, which is one of the many uses of a neural network.然而,这些表示对于使高维数据线性可分至关重要,这是神经网络的众多用途之一。

I hope this gave a decent idea of what these things are.我希望这能让你对这些东西有一个很好的了解。 If you wish to learn more, I suggest you go through Andrew Ng's machine learning course on Coursera.如果你想了解更多,我建议你在 Coursera 上学习 Andrew Ng 的机器学习课程。 It provides a wonderful introductory look into the field.它提供了对该领域的精彩介绍。

The value you're comparing your actual results to for the cost function doesn't (intrinsically) have anything to do with the input you used to get the output.您将实际结果与成本函数进行比较的值(本质上)与用于获取输出的输入没有任何关系。 It doesn't get transformed in any way.它不会以任何方式转变。

Your expected value is [10,200,3] but you used Softmax on the output layer and RMSE loss?你的期望值是 [10,200,3] 但你在输出层和 RMSE 损失上使用了 Softmax? Well, too bad, you're gonna have a high cost all the time (and the model probably won't converge).好吧,太糟糕了,你会一直有很高的成本(而且模型可能不会收敛)。

It's just on you to use the right cost functions to serve as a sane heuristic for evaluating the model performance and the right activations to be able to get sane outputs for the task at hand.您只需使用正确的成本函数作为评估模型性能和正确激活的合理启发式方法,以便能够为手头的任务获得合理的输出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM