简体繁体 English

神经网络在线培训

[英]Neural network online training

原文 2017-11-27 01:02:29 3 1 machine-learning/ neural-network

I want to implement a simple feed-forward neural network to approximate the function y=f(x)=ax^2 where a is some constant and x is the input value. 我想实现一个简单的前馈神经网络，以近似函数y = f（x）= ax ^ 2 ，其中a是一些常数， x是输入值。

The NN has one input node , one hidden layer with 1-n nodes , and one output node . NN具有一个输入节点 ， 一个具有1-n个节点的隐藏层和一个输出节点 。 For example, I input the value 2.0 -> the NN produces 4.0, and again I input 3.0 -> the NN produces 9.0 or close to it and so on. 例如，我输入值2.0-> NN产生4.0，然后再次输入3.0-> NN产生9.0或接近9.0，依此类推。

If I understand "online-training," the training data is fed one by one - meaning I input the value 2.0 -> I iterate with the gradient decent 100 times, and then I pass the value 3.0, and I iterate another 100 times. 如果我理解“在线训练”，则将训练数据一一馈入-表示我输入了值2.0->用适当的梯度进行了100次迭代，然后又通过了值3.0，又进行了100次迭代。

However, when I try to do this with my experimental/learning NN - I input the value 2.0 -> the error gets very small -> the output is very close to 4.0. 但是，当我尝试使用实验性/学习型NN进行操作时-输入值2.0->误差变得很小->输出非常接近4.0。

Now if I want to predict for the input 3.0 -> the NN produces 4.36 or something instead of 9.0. 现在，如果我要预测输入3.0-> NN会生成4.36左右的值而不是9.0。 So the NN just learns the last training value. 因此，NN只学习最后的训练值。

How can I use online-training to get a Neural Network that approximates the desired function for a range [-d, d]? 如何使用在线培训来获得一个在[-d，d]范围内近似所需功能的神经网络？ What am I missing? 我想念什么？

The reason why I like online-training is that eventually I want to input a time series - and map that series to the desired function. 我喜欢在线培训的原因是，最终我想输入一个时间序列-并将该序列映射到所需的功能。 This is besides the point but in case someone was wondering. 这不是重点，但万一有人想知道。

Any advise would be greatly appreciated. 任何建议将不胜感激。

More info - I am activating the hidden layer with the Sigmoid function and the output layer with the linear one. 更多信息-我正在使用Sigmoid函数激活隐藏层，并使用线性函数激活输出层。

1 个解决方案

The reason why I like online-training is that eventually I want to input a time series - and map that series to the desired function. 我喜欢在线培训的原因是，最终我想输入一个时间序列-并将该序列映射到所需的功能。

Recurrent Neural Networks (RNNs) are the state of the art for modeling time series. 递归神经网络（RNN）是对时间序列建模的最新技术。 This is because they can take inputs of arbitrary length, and they can also use internal state to model the changing behavior of the series over time. 这是因为它们可以接受任意长度的输入，并且还可以使用内部状态来建模序列随时间变化的行为。

Training feedforward neural networks for time series is an old method which will generally not perform as well. 为时间序列训练前馈神经网络是一种旧方法，通常效果不佳。 They require a fixed sized input so you must choose a fixed sized sliding time window, and they also don't preserve state, so it is hard to learn a time-varying function. 它们需要固定大小的输入，因此您必须选择固定大小的滑动时间窗口，并且它们也不保留状态，因此很难学习时变函数。

I can find very little about "online training" of feedforward neural nets with stochastic gradient descent to model non-stationary behavior except for a couple of very vague references. 除了一些非常模糊的参考文献之外，我几乎没有发现关于具有随机梯度下降来模拟非平稳行为的前馈神经网络的“在线训练”。 I don't think this provides any benefit besides allowing you to train in real time when you are getting a stream of data one at a time. 除了让您一次获得一个数据流时进行实时训练之外，我认为这没有任何好处。 I don't think it will actually help you model time-dependent behavior. 我认为它实际上不会帮助您建立时间相关行为的模型。

Most of the older methods I can find in the literature about online learning for neural networks use a hybrid approach with a neural network and some other method that can help capture time dependencies. 我可以在文献中找到的有关神经网络在线学习的大多数较旧的方法都使用神经网络的混合方法以及其他一些有助于捕获时间依赖性的方法。 Again, these should all be inferior to RNNs, not to mention harder to implement in practice. 同样，这些都应该不如RNN，更不用说在实践中更难实施了。

Furthermore, I don't think you are implementing online training correctly. 此外，我认为您没有正确实施在线培训。 It should be stochastic gradient descent with a mini-batch size of 1. Therefore, you only run one iteration of gradient descent on each training example per training epoch. 它应该是最小批量大小为1的随机梯度下降。因此，在每个训练纪元的每个训练示例上，您只能运行一次梯度下降的迭代。 Since you are running 100 iterations before moving on to the next training example, you are going too far down the error gradient with respect to that single example, resulting in serious overfitting to a single data point. 由于在继续进行下一个训练示例之前要运行100次迭代，因此相对于单个示例而言，误差梯度太低了，导致严重地过度拟合单个数据点。 This is why you get poor results on the next input. 这就是为什么您在下一个输入中获得较差结果的原因。 I don't think this is a justifiable method of training, nor do I think it will work for time series. 我认为这不是合理的培训方法，也不认为它适用于时间序列。

You haven't mentioned what your activations are or your loss function is, so I can't comment on whether those are appropriate for the task. 您没有提到激活是什么或丢失函数是什么，因此我无法评论激活是否适合该任务。

Also, I don't think the learning y=ax^2 is a good analogy for time series prediction. 另外，我认为学习y=ax^2对于时间序列预测不是一个很好的类比。 This is a static function that always gives the same output for a given input, regardless of the index of the input or the value of previous inputs. 这是一个静态函数，始终为给定输入提供相同的输出，而不管输入的索引或先前输入的值如何。