I'm just starting to learn neural networks, to see if they could be useful for me.
I downloaded this simple python code of 3 layer feed forward neural network
and I just modified the learning pattern to checkerboard instead of XOR and number of nodes in hidden layer to 10. If I understand universal approximation theorem this 3 layer network (one hidden layer) should be able to learn any function from R2 to R including my checkerboard function. ... But it does not.
What is wrong?
Is there any general approach how to modify the optimization (learning) scheme to be sure to converge to any function which can be theoretically described by the given network according to universal approximation theorem?
Is there any general test, which will tell me, If my network (with given topology, number of layers and nodes) is able to describe given function, if it is just trapped in some local minimum?
This are the results with 10 neurons in hidden layer:
train it with some patterns
error 3.14902
error 1.37104
error 1.35305
error 1.30453
error 1.28329
error 1.27599
error 1.27275
error 1.27108
error 1.27014
error 1.26957
test it
([0.0, 0.0], '->', [0.019645293674000152])
([0.0, 0.5], '->', [0.5981006916165954])
([0.0, 1.0], '->', [0.5673621981298169])
([0.5, 0.0], '->', [0.5801274708105488])
([0.5, 0.5], '->', [0.5475774428347904])
([0.5, 1.0], '->', [0.5054692523873793])
([1.0, 0.0], '->', [0.5269586801603834])
([1.0, 0.5], '->', [0.48368767897171666])
([1.0, 1.0], '->', [0.43916379836698244])
This is the definition of test run (only part of code I modified):
def demo():
# Teach network checkerboard function
pat = [
[ [0.0,0.0], [0.0] ],
[ [0.0,0.5], [1.0] ],
[ [0.0,1.0], [0.0] ],
[ [0.5,0.0], [1.0] ],
[ [0.5,0.5], [0.0] ],
[ [0.5,1.0], [1.0] ],
[ [1.0,0.0], [0.0] ],
[ [1.0,0.5], [1.0] ],
[ [1.0,1.0], [0.0] ]
]
# create a network with two input, 10 hidden, and one output nodes
n = NN(2, 10, 1)
print " train it with some patterns "
n.train(pat)
print " test it "
n.test(pat)
Universal Approximation Theorem shows that any continuous function can be arbitrary approximated with one hidden layer. It does not require any kind of data separability, we are talking about arbitrary functions.
In particular, if you have N hidden nodes, where N is number of training samples then it is always possible to perfectly learn your training set (it can simply memorize all input-putput pairs).
There are however none guarantees about generalization of such objects, neither there are on learning guarantees of smaller networks. Neural networks are not "universal answers", they are quite hard to handle correctly.
Getting back to your problem, your function is quite trivial, and none of the above concerns have application here, such a function can be easily learned by very basic networks. It looks like one of the two following aspects are the problem:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.