简体   繁体   English

ML和DL4J阶段的示例

[英]ML and DL4J Phases by Example

I have a large S3 bucket full of photos of 4 different types of animals. 我有一个装满4种不同类型动物照片的S3大桶。 My foray into ML will be to see if I can successfully get Deep Learning 4 Java (DL4J) to be shown a new arbitrary photo of one of those 4 species and get it to consistently, correctly guess which animal it is. 我对ML的尝试是看能否成功使Deep Learning 4 Java(DL4J)展示给这4个物种之一的新的任意照片,并使它一致,正确地猜出它是哪种动物。

My understanding is that I must first perform a " training phase " which effectively builds up an (in-memory) neural network that consists of nodes and weights derived from both this S3 bucket (input data) and my own coding and usage of the DL4J library. 我的理解是,我必须首先执行“ 训练阶段 ”,以有效地建立一个(内存中的)神经网络,该网络由从此S3存储桶(输入数据)以及我自己的DL4J编码和用法得出的节点和权重组成图书馆。

Once trained (meaning, once I have an in-memory neural net built up), then my understanding is that I can then enter zero or more " testing phases " where I give a single new image as input, let the program decide what type of animal it thinks the image is of, and then manually mark the output as being correct (the program guessed right) or incorrect w/ corrections (the program guessed wrong, and oh by the way, such and so was the correct answer). 一旦经过训练(意味着,一旦我建立了内存中的神经网络),那么我的理解就是我可以输入零个或多个“ 测试阶段 ”,在此我给出一个新图像作为输入,让程序决定哪种类型认为它是图像的动物,然后手动将输出标记为正确(程序猜对了)或不正确的w /更正(程序猜错了,顺便说一句,如此等等都是正确的答案)。 My understanding is that these test phases should help tweak you algorithms and minimize error. 我的理解是,这些测试阶段应有助于调整算法并最大程度地减少错误。

Finally, it is my understanding that the library can then be used in a live " production phase " whereby the program is just responding to images as inputs and making decisions as to what it thinks they are. 最后,据我了解 ,该库随后可用于实时“ 生产阶段 ”,其中程序仅响应图像作为输入并做出有关其认为的决定。

All this to ask: is my understanding of ML and DL4J's basic methodology correction, or am I mislead in any way? 所有这些要问的是:我对ML和DL4J的基本方法学的理解是正确的,还是我以任何方式误导了我?

Training: That's any framework. 培训:这就是任何框架。 You can also persist the neural network as well with either the java based SerializationUtils or in the newer release we have a ModelSerializer as well. 您还可以使用基于Java的SerializationUtils来持久化神经网络,或者在较新的版本中也可以使用ModelSerializer。

This is more of an integrations play than a "can it do x?" 这比“ x可以做吗?”更像是一个整合游戏。

DL4j can integrate with kafka/spark streaming and do online/mini batch learning. DL4j可以与kafka / spark流集成,并可以进行在线/小型批处理学习。

The neural nets are embeddable in a production environment. 神经网络可嵌入生产环境中。

My only tip here is to ensure that you have the same data pipeline for training as well as test. 我唯一的提示是确保您具有用于培训和测试的相同数据管道。

This is mainly for ensuring consistency of your data you are training vs testing on. 这主要是为了确保您正在训练和测试的数据的一致性。

As well as for mini batch learning ensure you have minibatch(true) (default) if you are doing mini batch/online learning or minibatch(false) if you are training on the whole dataset at once. 和小型批处理学习一样,如果要进行小型批处理/在线学习,请确保具有minibatch(true)(默认);如果要一次对整个数据集进行训练,请确保具有minibatch(false)。

I would also suggest using StandardScalar ( https://github.com/deeplearning4j/nd4j/blob/master/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/dataset/api/iterator/StandardScaler.java ) or something similar for persisting global statistics around your data. 我也建议使用StandardScalar( https://github.com/deeplearning4j/nd4j/blob/master/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/dataset /api/iterator/StandardScaler.java )或类似的用于持久保存数据周围的全局统计信息的工具。 Much of the data pipeline will depend on the libraries you are using to build your data pipeline though. 但是,大部分数据管道将取决于您用来构建数据管道的库。

I would assume you would want to normalize your data in some way though. 我假设您希望以某种方式标准化您的数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM