I have created a simple neural network with 3 layers according to this python example: Link (PS: You have to scroll down until you reach Part 2)
This is my Java implementation of the code:
private void trainNet()
{
// INPUT is a 4*3 matrix
// SYNAPSES is a 3*4 matrix
// SYNAPSES2 is a 4*1 matrix
// 4*3 matrix DOT 3*4 matrix => 4*4 matrix: unrefined test results
double[][] layer1 = sigmoid(dot(inputs, synapses), false);
// 4*4 matrix DOT 4*1 matrix => 4*1 matrix: 4 final test results
double[][] layer2 = sigmoid(dot(layer1, synapses2), false);
// 4*1 matrix - 4*1 matrix => 4*1 matrix: error of 4 test results
double[][] layer2Error = subtract(outputs, layer2);
// 4*1 matrix DOT 4*1 matrix => 4*1 matrix: percentage of change of 4 test results
double[][] layer2Delta = dot(layer2Error, sigmoid(layer2, true));
// 4*1 matrix DOT 3*1 matrix => 4*1 matrix
double[][] layer1Error = dot(layer2Delta, synapses2);
// 4*1 matrix DOT 4*4 matrix => 4*4 matrix: percentage of change of 4 test results
double[][] layer1Delta = dot(layer1Error, sigmoid(layer1, true));
double[][] transposedInputs = transpose(inputs);
double[][] transposedLayer1 = transpose(layer1);
// 4*4 matrix DOT 4*1 matrix => 4*1 matrix: the updated weights
// Update the weights
synapses2 = sum(synapses2, dot(transposedLayer1, layer2Delta));
// 3*4 matrix DOT 4*4 matrix => 3*4 matrix: the updated weights
// Update the weights
synapses = sum(synapses, dot(transposedInputs, layer1Delta));
// Test each value of two 4*1 matrices with each other
testValue(layer2, outputs);
}
The dot, sum, subtract and transpose functions I have created myself and I'm pretty sure they do their job perfectly.
The first batch of inputs gives me a error of about 0.4 which is alright, because the weights are of random value. On the second run the error margin is smaller, but only by a very tine amount (0.001)
After 500,000 batches (so 2,000,000 tests in total) the network still hasn't given out any correct value! So I tried using an even larger amount of batches. Using 1,000,000 batches (so 4,000,000 tests in total), the network generates a whopping 16,900 correct results.
Could anyone please tell me what's going on?
These were the used weights:
First layer:
Second layer:
-272.83589796861514
EDIT: Thanks to lsnare for pointing out to me using a library would be way easier!
For those interested here is the working code using math.nist.gov/javanumerics library:
private void trainNet()
{
// INPUT is a 4*3 matrix
// SYNAPSES is a 3*4 matrix
// SYNAPSES2 is a 4*1 matrix
// 4*3 matrix DOT 3*4 matrix => 4*4 matrix: unrefined test results
Matrix hiddenLayer = sigmoid(inputs.times(synapses), false);
// 4*4 matrix DOT 4*1 matrix => 4*1 matrix: 4 final test results
Matrix outputLayer = sigmoid(hiddenLayer.times(synapses2), false);
// 4*1 matrix - 4*1 matrix => 4*1 matrix: error of 4 test results
Matrix outputLayerError = outputs.minus(outputLayer);
// 4*1 matrix DOT 4*1 matrix => 4*1 matrix: percentage of change of 4 test results
Matrix outputLayerDelta = outputLayerError.arrayTimes(sigmoid(outputLayer, true));
// 4*1 matrix DOT 1*4 matrix => 4*4 matrix
Matrix hiddenLayerError = outputLayerDelta.times(synapses2.transpose());
// 4*4 matrix DOT 4*4 matrix => 4*4 matrix: percentage of change of 4 test results
Matrix hiddenLayerDelta = hiddenLayerError.arrayTimes(sigmoid(hiddenLayer, true));
// 4*4 matrix DOT 4*1 matrix => 4*1 matrix: the updated weights
// Update the weights
synapses2 = synapses2.plus(hiddenLayer.transpose().times(outputLayerDelta));
// 3*4 matrix DOT 4*4 matrix => 3*4 matrix: the updated weights
// Update the weights
synapses = synapses.plus(inputs.transpose().times(hiddenLayerDelta));
// Test each value of two 4*1 matrices with each other
testValue(outputLayer.getArrayCopy(), outputs.getArrayCopy());
}
In general, when writing code that involves advanced mathematical or numerical computation (such as linear algebra) it's best to use existing libraries written by experts in the field, rather than write your own functions. Standard libraries will produce more accurate results and are most likely more efficient. For example, in the blog that you reference, the author uses the numpy library to compute dot products and transposition of matrices. For Java, you could use the Java Matrix Package (JAMA) that was developed by NIST: http://math.nist.gov/javanumerics/jama/
For example, to transpose a matrix:
double[4][3] in = {{0,0,1},{0,1,1},{1,0,1},{1,1,1}};
Matrix input = new Matrix(in);
input = input.transpose();
I'm not sure if this will solve your issue completely, but hopefully this could help save you writing extra code in the future.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.