简体   繁体   中英

Neural network with 1 hidden layer cannot learn checkerboard function?

I'm just starting to learn neural networks, to see if they could be useful for me.

I downloaded this simple python code of 3 layer feed forward neural network

and I just modified the learning pattern to checkerboard instead of XOR and number of nodes in hidden layer to 10. If I understand universal approximation theorem this 3 layer network (one hidden layer) should be able to learn any function from R2 to R including my checkerboard function. ... But it does not.

What is wrong?

  • I understand universal approximation theorem wrong - maybe the function should be monotoneous or convex? (The area should be linearly separable? ).
  • I need an other layer (2 hidden layers) to approximate such non-convec, non linearly separable function?
  • The network is just traped in some local minimum? (but I don't think so? I tried several runs, initial weight are random, but the result is the same)
  • 10 nodes in hiden layer is not enought? I tried different number - with 5 it is almost the same. With 30 it does not

Is there any general approach how to modify the optimization (learning) scheme to be sure to converge to any function which can be theoretically described by the given network according to universal approximation theorem?

Is there any general test, which will tell me, If my network (with given topology, number of layers and nodes) is able to describe given function, if it is just trapped in some local minimum?

This are the results with 10 neurons in hidden layer:

 train it with some patterns 
error 3.14902
error 1.37104
error 1.35305
error 1.30453
error 1.28329
error 1.27599
error 1.27275
error 1.27108
error 1.27014
error 1.26957
 test it 
([0.0, 0.0], '->', [0.019645293674000152])
([0.0, 0.5], '->', [0.5981006916165954])
([0.0, 1.0], '->', [0.5673621981298169])
([0.5, 0.0], '->', [0.5801274708105488])
([0.5, 0.5], '->', [0.5475774428347904])
([0.5, 1.0], '->', [0.5054692523873793])
([1.0, 0.0], '->', [0.5269586801603834])
([1.0, 0.5], '->', [0.48368767897171666])
([1.0, 1.0], '->', [0.43916379836698244])

This is the definition of test run (only part of code I modified):

def demo():
    # Teach network checkerboard function
    pat = [
        [ [0.0,0.0], [0.0] ],
        [ [0.0,0.5], [1.0] ],
        [ [0.0,1.0], [0.0] ],

        [ [0.5,0.0], [1.0] ],
        [ [0.5,0.5], [0.0] ],
        [ [0.5,1.0], [1.0] ],

        [ [1.0,0.0], [0.0] ],
        [ [1.0,0.5], [1.0] ],
        [ [1.0,1.0], [0.0] ]
        ]

    # create a network with two input, 10 hidden, and one output nodes
    n = NN(2, 10, 1)
    print " train it with some patterns "
    n.train(pat)
    print " test it "
    n.test(pat)

Universal Approximation Theorem shows that any continuous function can be arbitrary approximated with one hidden layer. It does not require any kind of data separability, we are talking about arbitrary functions.

In particular, if you have N hidden nodes, where N is number of training samples then it is always possible to perfectly learn your training set (it can simply memorize all input-putput pairs).

There are however none guarantees about generalization of such objects, neither there are on learning guarantees of smaller networks. Neural networks are not "universal answers", they are quite hard to handle correctly.

Getting back to your problem, your function is quite trivial, and none of the above concerns have application here, such a function can be easily learned by very basic networks. It looks like one of the two following aspects are the problem:

  • implementation error
  • lack of correct activation functions in neurons (and/or bias terms)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM