简体繁体 English

如何使用Encog Java训练大型装备？

[英]How to train huge sets with Encog Java?

原文 2015-10-31 16:47:42 0 1 java/ neural-network/ encog

I have a huge set of data to be trained (gigabytes of data). 我有大量要训练的数据（千兆字节的数据）。

Is there any way to load and unload as it is needed? 有什么需要的装卸方法吗？

Would it be better to divide in small pieces, lets say 100 MB each, and train each subset until the error is fine. 最好将它分成几小块，每个小块100 MB，然后训练每个子集，直到错误消失为止。 And when it is done, start all over again until all errors are good enough? 完成后，重新开始直到所有错误都足够好为止？

Thanks 谢谢

1 个解决方案

So, did you try out what happens when you train with all the data? 那么，您是否尝试过训练所有数据时会发生什么？

Should be possible with encogs BufferedNeuralDataSet 应该可以用encogs BufferedNeuralDataSet

This class is not memory based, so very long files can be used, without running out of memory. 此类不是基于内存的，因此可以使用很长的文件，而不会耗尽内存。 This dataset uses a Encog binary training file as a buffer. 该数据集使用Encog二进制训练文件作为缓冲区。 When used with a slower access dataset, such as CSV, XML or SQL, where parsing must occur, this dataset can be used to load from the slower dataset and train at much higher speeds. 与必须进行解析的访问速度较慢的数据集（例如CSV，XML或SQL）一起使用时，可以使用此数据集从速度较慢的数据集中加载并以更高的速度进行训练。

Furthermore, I don't think you'll get good results with training only with small subsets, because you lower the error with the 1st subset, then retrain with the second subset which potentially contains very different data thus training the network to an error that won't be good for the first set and so on ... 此外，我认为仅使用较小的子集进行训练就不会取得良好的结果，因为您可以使用第一个子集降低错误，然后使用可能包含非常不同的数据的第二个子集进行重新训练，从而将网络训练为不会对第一盘好，依此类推...