[英]How to save models from ML Pipeline to S3 or HDFS?
I am trying to save thousands of models produced by ML Pipeline. 我正在努力保存ML Pipeline生产的数千种型号。 As indicated in the answer here , the models can be saved as follows:
正如在答复中指出这里 ,该机型可以保存如下:
import java.io._
def saveModel(name: String, model: PipelineModel) = {
val oos = new ObjectOutputStream(new FileOutputStream(s"/some/path/$name"))
oos.writeObject(model)
oos.close
}
schools.zip(bySchoolArrayModels).foreach{
case (name, model) => saveModel(name, Model)
}
I have tried using s3://some/path/$name
and /user/hadoop/some/path/$name
as I would like the models to be saved to amazon s3 eventually but they both fail with messages indicating the path cannot be found. 我已经尝试使用
s3://some/path/$name
和/user/hadoop/some/path/$name
因为我希望模型最终保存到amazon s3但是它们都失败并显示路径不能是找到。
How to save models to Amazon S3? 如何将模型保存到Amazon S3?
One way to save a model to HDFS is as following: 将模型保存到HDFS的一种方法如下:
// persist model to HDFS
sc.parallelize(Seq(model), 1).saveAsObjectFile("hdfs:///user/root/linReg.model")
Saved model can then be loaded as: 然后可以将已保存的模型加载为:
val linRegModel = sc.objectFile[LinearRegressionModel]("linReg.model").first()
Since Apache-Spark 1.6
and in the Scala
API, you can save your models without using any tricks. 从
Apache-Spark 1.6
和Scala
API开始,您可以在不使用任何技巧的情况下保存模型。 Because, all models from the ML library come with a save
method, you can check this in the LogisticRegressionModel , indeed it has that method. 因为ML库中的所有模型都带有一个
save
方法,你可以在LogisticRegressionModel中检查它,实际上它有这个方法。 By the way to load the model you can use a static method. 顺便加载模型,您可以使用静态方法。
val logRegModel = LogisticRegressionModel.load("myModel.model")
So FileOutputStream
saves to local filesystem (not through the hadoop libraries), so saving to a locally directory is the way to go about doing this. 因此
FileOutputStream
保存到本地文件系统(而不是通过hadoop库),因此保存到本地目录是实现此目的的方法。 That being said, the directory needs to exist, so make sure the directory exists first. 话虽如此,目录需要存在,因此请确保该目录首先存在。
That being said, depending on your model you may wish to look at https://spark.apache.org/docs/latest/mllib-pmml-model-export.html (pmml export). 话虽如此,根据您的模型,您可能希望查看https://spark.apache.org/docs/latest/mllib-pmml-model-export.html(pmml export)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.