简体   繁体   English

如何将H2o嵌入Java应用程序中?

[英]How can I embed H2o in a Java application?

I am trying start embedded H2o in a Java application and train a model. 我正在尝试在Java应用程序中开始嵌入式H2o并训练模型。 However I don't get what exactly explained in the documentation ( http://docs.h2o.ai/h2o/latest-stable/h2o-docs/faq/java.html ). 但是我没有得到文档中确切解释的内容( http://docs.h2o.ai/h2o/latest-stable/h2o-docs/faq/java.html )。 Can anyone help me by providing an example? 有人可以通过提供示例帮助我吗?

Thanks, 谢谢,

The critical thing to understand here is whether you really want to train a model in your application, or do you just want to score a model. 这里要了解的关键是您是否真的想在应用程序中训练模型,还是只想对模型评分。 Most people initially will just want to score a model. 最初,大多数人只想为模型评分。

SCORING 计分

Scoring is easy and natural. 计分既简单又自然。 See the MOJO and POJO javadoc api here: 请在此处查看MOJO和POJO javadoc api:

Follow the pattern shown in the javadoc to use the Easy API. 按照javadoc中显示的模式使用Easy API。 A snippet of the relevant code is included below: 以下是相关代码的片段:

EasyPredictModelWrapper model = new EasyPredictModelWrapper(MojoModel.load("GBM_model.zip"));
RowData row = new RowData();
row.put("AGE", "68");
...
BinomialModelPrediction p = model.predictBinomial(row);

SCORING AND SAVING FOR DEFERRED TRAINING 评分和保存以进行延迟培训

What many people will do is score in their live application, and also save new data (somewhere) for deferred training. 许多人会做的是在他们的实时应用程序中打分,并保存新数据(在某处)以进行延迟培训。 Then train models offline and push them into production again for scoring. 然后离线训练模型,然后将其再次投入生产以进行评分。 This is a pretty typical model lifecycle which is easy to understand and manage. 这是一个非常典型的模型生命周期,易于理解和管理。

TRAINING 训练

Embedding H2O inside your application for actual training is more involved. 在您的应用程序中嵌入H2O进行实际培训会更加复杂。

If I were going to embed H2O, I would do it one of two ways: 如果要嵌入H2O,则可以使用以下两种方法之一:

Well-supported option 1 . 良好支持的选项1 Start an H2O instance as a separate process (or set of processes in the distributed case) and communicate with it using R or Python. 将H2O实例作为一个单独的进程(或分布式情况下的一组进程)启动,并使用R或Python与之通信。

The well documented APIs for H2O are the R API and the Python API. 记录良好的H2O API是R API和Python API。 (There is also a REST API with lots of generated documentation, but I would not consider that particularly easy to use.) (还有一个REST API,其中包含大量生成的文档,但我认为它不那么易于使用。)

You will find lots of documentation and examples at: 您可以在以下位置找到许多文档和示例:

Well-supported Option 2 . 备受支持的选项2 Write a Spark application and use Sparkling Water and Scala or PySparkling and Python. 编写一个Spark应用程序,并使用Sparkling Water和Scala或PySparkling和Python。

This doesn't actually require much Spark, since the embedded H2O inside Sparkling Water doesn't actually rely on the Spark side at all. 实际上,这不需要太多的Spark,因为Sparkling Water中嵌入的H2O实际上根本不依赖于Spark端。 The Scala and Python APIs for Sparkling Water are well-documented. 气泡水的Scala和Python API都有详细的文档记录。 The Sparkling Water User Guide is a good place to start for this: 《苏打水用户指南》是开始这样做的好地方:

... And then here are other options which are harder: ...然后这是其他更难的选择:

(Harder) Option 3 . (更难)选项3 You can include H2O as a maven dependency and call it directly from Java. 您可以将H2O作为Maven依赖项包括在内,并直接从Java调用它。

The biggest problem here is Java API is not well documented, and you won't find friendly examples for how to use it. 这里最大的问题是Java API的文档不完善,您将不会找到友好的示例来使用它。 The best documentation for the Java API is source code itself, and the unit tests (search for 'test' directories) inside the h2o-3 project github here: Java API的最佳文档是源代码本身,以及h2o-3项目github中的单元测试(搜索“ test”目录):

(Harder) Option 4 . (更难)选项4 Some people have called H2O directly from the REST API. 有人直接从REST API调用了H2O。

I wouldn't recommend this because it's difficult, but if you want to try, the best way to learn how to use the REST API is to turn on logging from R and look at the message payloads between the R client and H2O: 我不建议这样做,因为这很困难,但是,如果您想尝试一下,学习如何使用REST API的最佳方法是打开R的日志记录,并查看R客户端和H2O之间的消息有效负载:

# R program.
h2o.init()
h2o.startLogging()
h2o.importFile("test.csv")
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM