如何在AWS Sagemaker中训练自己的模型？

Question

I just started with AWS and I want to train my own model with own dataset. 我刚开始使用AWS，我想用自己的数据集训练自己的模型。 I have my model as keras model with tensorflow backend in Python. 我在Python中将我的模型作为具有tensorflow后端的keras模型。 I read some documentations, they say I need a Docker image to load my model. 我阅读了一些文档，他们说我需要一个Docker映像来加载我的模型。 So, how do I convert keras model into Docker image. 因此，如何将keras模型转换为Docker映像。 I searched through internet but found nothing that explained the process clearly. 我通过互联网进行搜索，但没有发现任何可以清楚解释该过程的信息。 How to make docker image of keras model, how to load it to sagemaker. 如何制作keras模型的docker映像，以及如何将其加载到sagemaker。 And also how to load my data from a h5 file into S3 bucket for training? 还有如何将h5文件中的数据加载到S3存储桶中进行训练？ Can anyone please help me in getting clear explanation? 谁能帮我得到清楚的解释？

Answer 1

Although you can load a Docker container into Sagemaker for production, it sounds like you would be better served by completing the entire Sagemaker pipeline, starting with your data in S3 and training via Jupyter notebook, which supports keras & TF. 尽管您可以将Docker容器加载到Sagemaker中进行生产，但是听起来似乎最好通过完成整个Sagemaker管道来完成，从S3中的数据开始，并通过支持keras和TF的Jupyter笔记本进行培训。

Once you have a model trained the documentation walks through how to store and persist the model for production. 培训完模型后，文档将逐步介绍如何存储和保留模型以进行生产。 For Docker, you would build your Docker container and push it to AWS ECR and from there import -- note that from awslabs examples there is a very specific Docker directory structure you need to follow (ex: 对于Docker，您将构建Docker容器并将其推送到AWS ECR并从那里导入-请注意，在awslabs示例中，您需要遵循非常特定的Docker目录结构（例如：

https://github.com/awslabs/amazon-sagemaker-examples/blob/caa8ce243b51f6bdb15f2afc638d9c4e2ad436b9/hyperparameter_tuning/keras_bring_your_own/trainer/environment.py ). https://github.com/awslabs/amazon-sagemaker-examples/blob/caa8ce243b51f6bdb15f2afc638d9c4e2ad436b9/hyperparameter_tuning/keras_bring_your_own/trainer/environment.py ）。

Answer 2

You can convert your Keras model to a tf.estimator and train using the TensorFlow framework estimators in Sagemaker. 您可以将Keras模型转换为tf.estimator并使用Sagemaker中的TensorFlow框架估算器进行训练。

This conversion is pretty basic though, I reimplemented my models in TensorFlow using the tf.keras API which makes the model nearly identical and train with the Sagemaker TF estimator in script mode. 不过，这种转换是非常基本的，我使用tf.keras API在TensorFlow中重新实现了模型，这使模型几乎完全相同，并以脚本模式与Sagemaker TF估计器一起训练。

My initial approach using pure Keras models was based on bring-your-own-algo containers similar to the answer by Matthew Arthur. 我最初使用纯Keras模型的方法是基于自带算法容器，类似于Matthew Arthur的回答。

Answer 3

A good starting point to writing your own custom algorithms is the Scikit Building Your Own Algorithm Container tutorial. Scikit编写自己的算法容器教程是编写自己的自定义算法的一个很好的起点。 It gives you an overview of Docker, steps for packaging your script into a container, uploading and running a training job. 它概述了Docker，将脚本打包到容器中，上传和运行培训工作的步骤。

If you already have a hdf5 file, you can just use the AWS CLI to upload it to a bucket owned by you. 如果您已经有了hdf5文件，则可以使用AWS CLI将其上传到您拥有的存储桶中。

$ aws s3 cp ./path/to/file.h5 s3://my-sagemaker-bucket/folder/file.h5

Then, when creating your training job, you can specify an [input channel]:( http://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTrainingJob.html#SageMaker-CreateTrainingJob-request-InputDataConfig ) 然后，在创建培训工作时，您可以指定一个[输入通道] ：( http://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTrainingJob.html#SageMaker-CreateTrainingJob-request-InputDataConfig ）

[{ 
  "ChannelName": "train",
  "DataSource": {
    "S3DataSource": {
      "S3Uri": "s3://my-sagemaker-bucket/folder",
      "S3DataType": "S3Prefix",
      "S3DataDistributionType": "FullyReplicated"
    }
  }
}]

When the training job begins, your containerized script should be able to find it on its local filesystem at /opt/ml/input/data/train/file.h5 , and be able to read it like a normal file. 当训练工作开始时，您的容器化脚本应该能够在其本地文件系统/opt/ml/input/data/train/file.h5上找到它，并且能够像普通文件一样读取它。 Note that "train" in this file path corresponds to the channel name you specified in the input-data-config. 请注意，此文件路径中的“ train”对应于您在input-data-config中指定的通道名称。

You can read more at https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html 您可以在https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html上阅读更多内容。

Answer 4

Keras is now natively supported in SageMaker, with both the TensorFlow and MXNet built-in frameworks. 现在，TensorFlow和MXNet内置框架都在SageMaker中本地支持Keras。 You can train and deploy with SageMaker, or you can import existing Keras models in TensorFlow Serving format and deploy them. 您可以使用SageMaker进行培训和部署，也可以以TensorFlow Serving格式导入现有的Keras模型并进行部署。

Here's a detailed example: https://aws.amazon.com/blogs/machine-learning/train-and-deploy-keras-models-with-tensorflow-and-apache-mxnet-on-amazon-sagemaker/ 这是一个详细的示例： https : //aws.amazon.com/blogs/machine-learning/train-and-deploy-keras-models-with-tensorflow-and-apache-mxnet-on-amazon-sagemaker/

如何在AWS Sagemaker中训练自己的模型？

问题描述

4 个解决方案

解决方案1
0 2018-12-25 12:33:43

解决方案2
0 2019-01-08 10:37:20

解决方案3
0 2019-01-11 22:25:59

解决方案4
0 2019-07-19 13:13:36

如何在AWS Sagemaker中训练自己的模型？

问题描述

4 个解决方案

解决方案1 0 2018-12-25 12:33:43

解决方案2 0 2019-01-08 10:37:20

解决方案3 0 2019-01-11 22:25:59

解决方案4 0 2019-07-19 13:13:36

解决方案1
0 2018-12-25 12:33:43

解决方案2
0 2019-01-08 10:37:20

解决方案3
0 2019-01-11 22:25:59

解决方案4
0 2019-07-19 13:13:36