简体   繁体   English

有关回归神经网络推广的建议

[英]Suggestions needed about the generalization of a regression neural network

I've trained a deep neural network of a few hundreds of features which analyzes geo data of a city, and calculate a score per sample based on the profile between the observer and the target location. 我已经训练了一个具有数百个功能的深度神经网络,该网络分析城市的地理数据,并根据观察者与目标位置之间的轮廓来计算每个样本的得分。 That is, the longer the distance between the observer and target, the more features I will have for this sample. 也就是说,观察者与目标之间的距离越长,我将为该样本提供的功能越多。 When I train my NN with samples from part of a city and test with other parts of the same city, the NN works very well, but when I apply my NN to other cities, the NN starts to give high standard deviation of errors, especially on cases which the samples of the city I'm applying the NN to generally has more features than samples of the city I used to train this NN. 当我用一个城市的一部分训练样本的NN并在同一城市的其他地方进行测试时,NN效果很好,但是当我将NN应用于其他城市时,NN开始产生很高的误差标准偏差,尤其是在我正在应用NN的城市样本通常比我用来训练NN的城市样本具有更多特征的情况下。 To deal with that, I've appended 10% of empty samples in training which was able to reduce the errors by half, but the remaining errors are still too large compare to the solutions calculated by hand. 为了解决这个问题,我在训练中附加了10%的空样本,这可以将误差减少一半,但是与手工计算的解决方案相比,剩余的误差仍然太大。 May I have some advise of generalize a regression neural network? 我可以建议归纳回归神经网络吗? Thanks! 谢谢!

I was going to ask for more examples of your data, and your network, but it wouldn't really matter. 我想问更多有关您的数据和网络的示例,但这并不重要。

How to improve the generalization of a regression neural network? 如何提高回归神经网络的泛化能力?

You can use exactly the same things you would use for a classification neural network. 您可以使用与分类神经网络完全相同的东西。 The only difference is what it does with the numbers that are output from the penultimate layer! 唯一的区别是对倒数第二层输出的数字的处理方式!

I've appended 10% of empty samples in training which was able to reduce the errors by half, 我在训练中附加了10%的空样本,可以将错误减少一半,

I didn't quite understand what that meant (so I'd still be interested if you expanded your question with some more concrete details), but it sounds a bit like using dropout. 我不太清楚这是什么意思(因此,如果您使用一些更具体的细节扩展问题,我仍然很感兴趣),但这听起来有点像使用辍学。 In Keras you append a Dropout() layer between your other layers: 在Keras中,您可以在其他图层之间添加一个Dropout()图层:

...
model.append(Dense(...))
model.append(Dropout(0.2))
model.append(Dense(...))
...

0.2 means 20% dropout, which is a nice starting point: you could experiment with values up to about 0.5. 0.2表示20%的辍学率,这是一个不错的起点:您可以尝试使用最高约0.5的值。 You could read the original paper or this article seems to be a good introduction with keras examples. 您可以阅读原始论文,或者这篇文章似乎是有关keras示例的很好的介绍。

The other generic technique is to add some L1 and/or L2 regularization, here is the manual entry . 另一种通用技术是添加一些L1和/或L2正则化,这是手动输入

I typically use a grid search to experiment with each of these, eg trying each of 0, 1e-6, 1e-5 for each of L1 and L2, and each of 0, 0.2, 0.4 (usually using the same value between all layers, for simplicity) for dropout. 我通常使用网格搜索来对它们中的每一个进行试验,例如,分别为L1和L2分别尝试0、1e-6、1e-5,以及分别为0、0.2、0.4(通常在所有层之间使用相同的值) ,为简单起见)。 (If 1e-5 is best, I might also experiment with 5e-4 and 1e-4.) (如果1e-5最好,我可能还会尝试5e-4和1e-4。)

But, remember that even better than the above are more training data. 但是,请记住,比以上更好的是更多的训练数据。 Also consider using domain knowledge to add more data, or more features. 还可以考虑使用领域知识来添加更多数据或更多功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM