简体   繁体   English

具有1个隐藏层的神经网络无法学习棋盘功能?

[英]Neural network with 1 hidden layer cannot learn checkerboard function?

I'm just starting to learn neural networks, to see if they could be useful for me. 我刚刚开始学习神经网络,以了解它们是否对我有用。

I downloaded this simple python code of 3 layer feed forward neural network 我下载了3层前馈神经网络的简单python代码

and I just modified the learning pattern to checkerboard instead of XOR and number of nodes in hidden layer to 10. If I understand universal approximation theorem this 3 layer network (one hidden layer) should be able to learn any function from R2 to R including my checkerboard function. 我只是将学习模式修改为棋盘格,而不是XOR,并将隐藏层中的节点数更改为10。如果我理解通用逼近定理,那么这3层网络(一个隐藏层)应该能够学习从R2到R的任何函数,包括棋盘功能。 ... But it does not. ...但事实并非如此。

What is wrong? 怎么了?

  • I understand universal approximation theorem wrong - maybe the function should be monotoneous or convex? 我理解普遍近似定理是错误的-也许函数应该是单调的或凸的? (The area should be linearly separable? ). (该区域是否可以线性分隔?)。
  • I need an other layer (2 hidden layers) to approximate such non-convec, non linearly separable function? 我需要另一层(2个隐藏层)来近似这种非对流,非线性可分离的函数吗?
  • The network is just traped in some local minimum? 网络只是被困在一些本地最小值中? (but I don't think so? I tried several runs, initial weight are random, but the result is the same) (但我不这么认为?我尝试了几次,初始权重是随机的,但结果是相同的)
  • 10 nodes in hiden layer is not enought? 隐藏层中的10个节点不够? I tried different number - with 5 it is almost the same. 我尝试了不同的数字-与5几乎相同。 With 30 it does not 与30它不

Is there any general approach how to modify the optimization (learning) scheme to be sure to converge to any function which can be theoretically described by the given network according to universal approximation theorem? 是否有任何通用方法可以修改优化(学习)方案,以确保收敛到给定网络根据通用逼近定理从理论上可以描述的任何函数?

Is there any general test, which will tell me, If my network (with given topology, number of layers and nodes) is able to describe given function, if it is just trapped in some local minimum? 是否有任何常规测试可以告诉我,如果我的网络(具有给定的拓扑,层数和节点数)是否能够描述给定的功能(如果它只是陷入某些局部最小值中)?

This are the results with 10 neurons in hidden layer: 这是隐藏层中有10个神经元的结果:

 train it with some patterns 
error 3.14902
error 1.37104
error 1.35305
error 1.30453
error 1.28329
error 1.27599
error 1.27275
error 1.27108
error 1.27014
error 1.26957
 test it 
([0.0, 0.0], '->', [0.019645293674000152])
([0.0, 0.5], '->', [0.5981006916165954])
([0.0, 1.0], '->', [0.5673621981298169])
([0.5, 0.0], '->', [0.5801274708105488])
([0.5, 0.5], '->', [0.5475774428347904])
([0.5, 1.0], '->', [0.5054692523873793])
([1.0, 0.0], '->', [0.5269586801603834])
([1.0, 0.5], '->', [0.48368767897171666])
([1.0, 1.0], '->', [0.43916379836698244])

This is the definition of test run (only part of code I modified): 这是测试运行的定义(仅我修改的部分代码):

def demo():
    # Teach network checkerboard function
    pat = [
        [ [0.0,0.0], [0.0] ],
        [ [0.0,0.5], [1.0] ],
        [ [0.0,1.0], [0.0] ],

        [ [0.5,0.0], [1.0] ],
        [ [0.5,0.5], [0.0] ],
        [ [0.5,1.0], [1.0] ],

        [ [1.0,0.0], [0.0] ],
        [ [1.0,0.5], [1.0] ],
        [ [1.0,1.0], [0.0] ]
        ]

    # create a network with two input, 10 hidden, and one output nodes
    n = NN(2, 10, 1)
    print " train it with some patterns "
    n.train(pat)
    print " test it "
    n.test(pat)

Universal Approximation Theorem shows that any continuous function can be arbitrary approximated with one hidden layer. 通用逼近定理表明, 任何连续函数都可以用一个隐藏层任意逼近。 It does not require any kind of data separability, we are talking about arbitrary functions. 它不需要任何类型的数据可分离性,我们正在谈论任意函数。

In particular, if you have N hidden nodes, where N is number of training samples then it is always possible to perfectly learn your training set (it can simply memorize all input-putput pairs). 特别是,如果您有N个隐藏节点,其中N是训练样本数,那么始终可以完美地学习您的训练集(它可以简单地记住所有输入-输入对)。

There are however none guarantees about generalization of such objects, neither there are on learning guarantees of smaller networks. 但是, 没有关于这些对象的泛化的保证,也没有关于较小网络的学习保证。 Neural networks are not "universal answers", they are quite hard to handle correctly. 神经网络不是“通用答案”,它们很难正确处理。

Getting back to your problem, your function is quite trivial, and none of the above concerns have application here, such a function can be easily learned by very basic networks. 回到您的问题,您的功能相当琐碎,并且上述所有问题都没有在这里应用,这种功能可以通过非常基本的网络轻松学习。 It looks like one of the two following aspects are the problem: 看起来这是以下两个方面之一:

  • implementation error 实施错误
  • lack of correct activation functions in neurons (and/or bias terms) 缺乏正确的神经元激活功能(和/或偏见)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM