简体   繁体   English

拟合正弦函数的神经网络玩具模型失败,怎么了?

[英]Neural network toy model to fit sine function fails, what's wrong?

Graduate student, new to Keras and neural networks was trying to fit a very simple feedforward neural network to a one-dimensional sine . 研究生,不Keras和神经网络, 正在尝试将非常简单的前馈神经网络拟合到一维正弦波

Below are three examples of the best fit that I can get. 以下是我可以获得的最合适的三个示例。 On the plots, you can see the output of the network vs ground truth 在图中,您可以看到网络的输出与地面真相的对比

神经网络输出与地面真相(运行3)

神经网络输出与地面真相(运行1)
神经网络输出与地面真相(运行2)

The complete code, just a few lines, is posted here example Keras 完整的代码(只有几行)在此处发布, 示例Keras


I was playing with the number of layers, different activation functions, different initializations, and different loss functions, batch size, number of training samples. 我正在使用多个层,不同的激活函数,不同的初始化和不同的损失函数,批处理大小,训练样本数量。 It seems that none of those were able to improve the results beyond the above examples. 除上述示例外,似乎没有一个能够改善结果。

I would appreciate any comments and suggestions. 我将不胜感激任何意见和建议。 Is sine a hard function for a neural network to fit? 正弦是神经网络拟合的硬函数吗? I suspect that the answer is not, so I must be doing something wrong... 我怀疑答案不是,所以我一定做错了...


There is a similar question here from 5 years ago, but the OP there didn't provide the code and it is still not clear what went wrong or how he was able to resolve this problem. 从5年前开始, 这里有一个类似的问题,但是那里的OP没有提供代码,并且仍然不清楚出了什么问题或他如何解决此问题。

In order to make your code work, you need to: 为了使代码正常工作,您需要:

  • scale the input values in the [-1, +1] range (neural networks don't like big values) 在[-1,+1]范围内缩放输入值(神经网络不喜欢大值)
  • scale the output values as well, as the tanh activation doesn't work too well close to +/-1 也可以缩放输出值,因为tanh激活在+/- 1附近效果不佳
  • use the relu activation instead of tanh in all but the last layer (converges way faster) 使用RELU激活,而不是在正切所有,但最后一层(收敛方式更快)

With these modifications, I was able to run your code with two hidden layers of 10 and 25 neurons 经过这些修改,我能够在10和25个神经元的两个隐藏层中运行您的代码

Since there is already an answer that provides a workaround I'm going to focus on problems with your approach. 由于已经有一个提供解决方案的答案,因此我将重点介绍您的方法存在的问题。

Input data scale 输入数据刻度

As others have stated, your input data value range from 0 to 1000 is quite big. 正如其他人所述,您输入的数据值范围从0到1000很大。 This problem can be easily solved by scaling your input data to zero mean and unit variance ( X = (X - X.mean())/X.std() ) which will result in improved training performance. 通过将输入数据缩放到零均值和单位方差( X = (X - X.mean())/X.std() ),可以轻松解决此问题,这将改善培训效果。 For tanh this improvement can be explained by saturation: tanh maps to [-1;1] and will therefore return either -1 or 1 for almost all sufficiently big (>3) x , ie it saturates. 对于tanh这种改善可以用饱和度来解释: tanh映射为[-1; 1],因此对于几乎所有足够大的(> 3) x都将返回-1或1,即饱和。 In saturation the gradient for tanh will be close to zero and nothing will be learned. 在饱和状态下, tanh的梯度将接近于零,并且不会学到任何东西。 Of course, you could also use ReLU instead, which won't saturate for values > 0, however you will have a similar problem as now gradients depend (almost) solely on x and therefore later inputs will always have higher impact than earlier inputs (among other things). 当然,您也可以改用ReLU ,它不会使值> 0饱和,但是您将遇到类似的问题,因为现在梯度(几乎)仅依赖于x ,因此,较晚的输入总是比较早的输入具有更高的影响力(除其他事项外)。

While re-scaling or normalization may be a solution, another solution would be to treat your input as a categorical input and map your discrete values to a one-hot encoded vector, so instead of 虽然重新缩放或规范化可能是一种解决方案,但另一种解决方案是将您的输入视为分类输入,并将离散值映射到一个热编码的矢量,因此,

>>> X = np.arange(T)
>>> X.shape
(1000,)

you would have 你将会拥有

>>> X = np.eye(len(X))
>>> X.shape
(1000, 1000)

Of course this might not be desirable if you want to learn continuous inputs. 当然,如果您想学习连续输入,这可能不是理想的。

Modeling 造型

You are currently trying to model a mapping from a linear function to a non-linear function: you map f(x) = x to g(x) = sin(x) . 当前,您正在尝试建模从线性函数到非线性函数的映射:将f(x) = x映射到g(x) = sin(x) While I understand that this is a toy problem, this way of modeling is limited to only this one curve as f(x) is in no way related to g(x) . 虽然我知道这是一个玩具问题,但由于f(x)g(x)毫无关系,因此这种建模方式仅限于一条曲线。 As soon as you are trying to model different curves, say both sin(x) and cos(x) , with the same network you will have a problem with your X as it has exactly the same values for both curves. 当您尝试对不同的曲线建模时,用相同的网络说出sin(x)cos(x) ,您的X就会出现问题,因为X的两条曲线的值都完全相同。 A better approach of modeling this problem is to predict the next value of the curve, ie instead of 对此问题建模的更好方法是预测曲线的下一个值 ,即

X = range(T)
Y = sin(x)

you want 你要

X = sin(X)[:-1]
Y = sin(X)[1:]

so for time-step 2 you will get the y value of time-step 1 as input and your loss expects the y value of time-step 2. This way you implicitly model time. 因此对于时间步骤2,您将获得时间步骤1的y值作为输入,而损失则期望时间步骤2的y值。这样,您就可以隐式地对时间建模。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM