[英]Cost function computation for neural network
我在 Coursera 上 Andrew Ng 的机器学习课程的第 5 周。 本周我正在完成 Matlab 中的编程作业,我选择使用 for 循环实现来计算成本 J。这是我的 function。
function [J grad] = nnCostFunction(nn_params, ...
input_layer_size, ...
hidden_layer_size, ...
num_labels, ...
X, y, lambda)
%NNCOSTFUNCTION Implements the neural network cost function for a two layer
%neural network which performs classification
% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...
% X, y, lambda) computes the cost and gradient of the neural network. The
% parameters for the neural network are "unrolled" into the vector
% nn_params and need to be converted back into the weight matrices.
% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices
% for our 2 layer neural network
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
num_labels, (hidden_layer_size + 1));
% Setup some useful variables
m = size(X, 1);
% add bias to X to create 5000x401 matrix
X = [ones(m, 1) X];
% You need to return the following variables correctly
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));
% initialize summing terms used in cost expression
sum_i = 0.0;
% loop through each sample to calculate the cost
for i = 1:m
% logical vector output for 1 example
y_i = zeros(num_labels, 1);
class = y(m);
y_i(class) = 1;
% first layer just equals features in one example 1x401
a1 = X(i, :);
% compute z2, a 25x1 vector
z2 = Theta1*a1';
% compute activation of z2
a2 = sigmoid(z2);
% add bias to a2 to create a 26x1 vector
a2 = [1; a2];
% compute z3, a 10x1 vector
z3 = Theta2*a2;
%compute activation of z3. returns output vector of size 10x1
a3 = sigmoid(z3);
h = a3;
% loop through each class k to sum cost over each class
for k = 1:num_labels
% sum_i returns cost summed over each class
sum_i = sum_i + ((-1*y_i(k) * log(h(k))) - ((1 - y_i(k)) * log(1 - h(k))));
end
end
J = sum_i/m;
我知道这个的矢量化实现会更容易,但我不明白为什么这个实现是错误的。 当 num_labels = 10 时,这个 function 输出 J = 8.47,但预期成本为 0.287629。 我从这个公式计算了 J 。 我误解了计算吗? 我的理解是,计算 10 个类中每个类的每个训练示例的成本,然后将每个示例的所有 10 个类的成本加在一起。 这是不正确的吗? 还是我没有在我的代码中正确实现这一点? 提前致谢。
问题出在您正在实施的公式中
这个表达式((-1*y_i(k) * log(h(k))) - ((1 - y_i(k)) * log(1 - h(k))));
表示二进制分类中的损失,因为你只是有 2 个类,所以要么
y_i is 0 so (1 - yi) = 1
y_i is 1 so (1 - yi) = 0
所以你基本上只考虑目标 class 概率。
如何在您提到 (y_i) 或 (1 - yi) 的 10 个标签的情况下,其中一个不需要为 0,另一个为 1
您应该更正损失 function 实现,以便您只考虑目标 class 的概率,而不是所有其他类。
我的问题是索引。 而不是说class = y(m)
它应该是class = y(i)
因为i
是索引,而m
是训练数据中行数的 5000。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.