简体繁体中英

fit in distributed, predict in a stand alone

原文 2016-08-19 14:00:49 8 1 apache-spark/ jvm/ pmml

How can one train (fit) a model in a distributed big data platform (eg Apache Spark) yet use that model in a stand alone machine (eg JVM) with as little dependency as possible?

I heard of PMML yet I am not sure if it is enough. Also Spark 2.0 supports persistent model saving yet I am not sure what is necessary to load and run those models.

1 answers

Apache Spark persistence is about saving and loading Spark ML pipelines in JSON data format (think of it as Python's pickle mechanism, or R's RDS mechanism). These JSON data structures map to Spark ML classes. They don't make sense on other platforms.

As for PMML, then you can convert Spark ML pipelines to PMML documents using the JPMML-SparkML library. You can execute PMML documents (doesn't matter whether they came from Apache Spark, Python or R) using the JPMML-Evaluator library. If you're using Apache Maven to manage and build your project, then JPMML-Evaluator can be included by adding just one dependency declaration to your project's POM.

SparkLauncher Stand Alone Cluster Mode

spark stand alone cluster jar not found

Error on building stand-alone Spark 1.3.1

Application UI with Spark Stand Alone Cluster

Memory usage in Spark stand alone setup

Spark 0.90 Stand alone connection refused

Spark stand alone mode: Submitting jobs programmatically

Spark stand alone cluster file access

how to properly submit spark jobs on a stand-alone cluster

apache spark stand alone connecting to mongodb with scala using casbah

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question SparkLauncher Stand Alone Cluster Mode spark stand alone cluster jar not found Error on building stand-alone Spark 1.3.1 Application UI with Spark Stand Alone Cluster Memory usage in Spark stand alone setup Spark 0.90 Stand alone connection refused Spark stand alone mode: Submitting jobs programmatically Spark stand alone cluster file access how to properly submit spark jobs on a stand-alone cluster apache spark stand alone connecting to mongodb with scala using casbah

Related Tags

fit in distributed, predict in a stand alone

Question

1 answers

solution1 2 2016-08-19 17:05:06

solution1
2 2016-08-19 17:05:06