简体   繁体   English

如何在AWS Sagemaker中训练自己的模型?

[英]How to train your own model in AWS Sagemaker?

I just started with AWS and I want to train my own model with own dataset. 我刚开始使用AWS,我想用自己的数据集训练自己的模型。 I have my model as keras model with tensorflow backend in Python. 我在Python中将我的模型作为具有tensorflow后端的keras模型。 I read some documentations, they say I need a Docker image to load my model. 我阅读了一些文档,他们说我需要一个Docker映像来加载我的模型。 So, how do I convert keras model into Docker image. 因此,如何将keras模型转换为Docker映像。 I searched through internet but found nothing that explained the process clearly. 我通过互联网进行搜索,但没有发现任何可以清楚解释该过程的信息。 How to make docker image of keras model, how to load it to sagemaker. 如何制作keras模型的docker映像,以及如何将其加载到sagemaker。 And also how to load my data from a h5 file into S3 bucket for training? 还有如何将h5文件中的数据加载到S3存储桶中进行训练? Can anyone please help me in getting clear explanation? 谁能帮我得到清楚的解释?

Although you can load a Docker container into Sagemaker for production, it sounds like you would be better served by completing the entire Sagemaker pipeline, starting with your data in S3 and training via Jupyter notebook, which supports keras & TF. 尽管您可以将Docker容器加载到Sagemaker中进行生产,但是听起来似乎最好通过完成整个Sagemaker管道来完成,从S3中的数据开始,并通过支持keras和TF的Jupyter笔记本进行培训。

Once you have a model trained the documentation walks through how to store and persist the model for production. 培训完模型后,文档将逐步介绍如何存储和保留模型以进行生产。 For Docker, you would build your Docker container and push it to AWS ECR and from there import -- note that from awslabs examples there is a very specific Docker directory structure you need to follow (ex: 对于Docker,您将构建Docker容器并将其推送到AWS ECR并从那里导入-请注意,在awslabs示例中,您需要遵循非常特定的Docker目录结构(例如:

https://github.com/awslabs/amazon-sagemaker-examples/blob/caa8ce243b51f6bdb15f2afc638d9c4e2ad436b9/hyperparameter_tuning/keras_bring_your_own/trainer/environment.py ). https://github.com/awslabs/amazon-sagemaker-examples/blob/caa8ce243b51f6bdb15f2afc638d9c4e2ad436b9/hyperparameter_tuning/keras_bring_your_own/trainer/environment.py )。

You can convert your Keras model to a tf.estimator and train using the TensorFlow framework estimators in Sagemaker. 您可以将Keras模型转换为tf.estimator并使用Sagemaker中的TensorFlow框架估算器进行训练。

This conversion is pretty basic though, I reimplemented my models in TensorFlow using the tf.keras API which makes the model nearly identical and train with the Sagemaker TF estimator in script mode. 不过,这种转换是非常基本的,我使用tf.keras API在TensorFlow中重新实现了模型,这使模型几乎完全相同,并以脚本模式与Sagemaker TF估计器一起训练。

My initial approach using pure Keras models was based on bring-your-own-algo containers similar to the answer by Matthew Arthur. 我最初使用纯Keras模型的方法是基于自带算法容器,类似于Matthew Arthur的回答。

A good starting point to writing your own custom algorithms is the Scikit Building Your Own Algorithm Container tutorial. Scikit编写自己的算法容器教程是编写自己的自定义算法的一个很好的起点。 It gives you an overview of Docker, steps for packaging your script into a container, uploading and running a training job. 它概述了Docker,将脚本打包到容器中,上传和运行培训工作的步骤。

If you already have a hdf5 file, you can just use the AWS CLI to upload it to a bucket owned by you. 如果您已经有了hdf5文件,则可以使用AWS CLI将其上传到您拥有的存储桶中。

$ aws s3 cp ./path/to/file.h5 s3://my-sagemaker-bucket/folder/file.h5

Then, when creating your training job, you can specify an [input channel]:( http://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTrainingJob.html#SageMaker-CreateTrainingJob-request-InputDataConfig ) 然后,在创建培训工作时,您可以指定一个[输入通道] :( http://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTrainingJob.html#SageMaker-CreateTrainingJob-request-InputDataConfig

[{ 
  "ChannelName": "train",
  "DataSource": {
    "S3DataSource": {
      "S3Uri": "s3://my-sagemaker-bucket/folder",
      "S3DataType": "S3Prefix",
      "S3DataDistributionType": "FullyReplicated"
    }
  }
}]

When the training job begins, your containerized script should be able to find it on its local filesystem at /opt/ml/input/data/train/file.h5 , and be able to read it like a normal file. 当训练工作开始时,您的容器化脚本应该能够在其本地文件系统/opt/ml/input/data/train/file.h5上找到它,并且能够像普通文件一样读取它。 Note that "train" in this file path corresponds to the channel name you specified in the input-data-config. 请注意,此文件路径中的“ train”对应于您在input-data-config中指定的通道名称。

You can read more at https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html 您可以在https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html上阅读更多内容。

Keras is now natively supported in SageMaker, with both the TensorFlow and MXNet built-in frameworks. 现在,TensorFlow和MXNet内置框架都在SageMaker中本地支持Keras。 You can train and deploy with SageMaker, or you can import existing Keras models in TensorFlow Serving format and deploy them. 您可以使用SageMaker进行培训和部署,也可以以TensorFlow Serving格式导入现有的Keras模型并进行部署。

Here's a detailed example: https://aws.amazon.com/blogs/machine-learning/train-and-deploy-keras-models-with-tensorflow-and-apache-mxnet-on-amazon-sagemaker/ 这是一个详细的示例: https : //aws.amazon.com/blogs/machine-learning/train-and-deploy-keras-models-with-tensorflow-and-apache-mxnet-on-amazon-sagemaker/

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何选择超参数来训练AWS sagemaker上的良好物体检测模型? - How to choose hyperparameters to train a good object detection model on AWS sagemaker? AWS Sagemaker | 如何训练文本数据| 机票分类 - AWS Sagemaker | how to train text data | For ticket classification 如何在 amazon sagemaker 上运行你自己的 python 代码 - How to run your own python code on amazon sagemaker AWS Sagemaker:当您上传自己的本地训练的 SKlearn model 压缩包时,entry_point 文件中需要哪个函数/代码进行预测? - AWS Sagemaker: Which function/code is required in entry_point file for a prediction when you upload your own, locally trained SKlearn model tarball? 训练自己的 model 进行情绪分析或使用预先训练的 model (如 vader 和 textblob)更好的是什么? - what is better to train your own model for sentiment analysis or to use pre trained model like vader and textblob? 如何使用 AWS SageMaker Autopilot 创建的模型生成批量预测? - How to generate batch forecasts using model created by AWS SageMaker Autopilot? 如何使用AWS Sagemaker托管的自定义Tensorflow模型的多个输入 - How to use multiple inputs for custom Tensorflow model hosted by AWS Sagemaker 在AWS Sagemaker中运行xgb_model.fit()时,什么是“ [0]#011train-merror:0.17074#011validation-merror:0.1664”错误? - What is “[0]#011train-merror:0.17074#011validation-merror:0.1664” error when running xgb_model.fit() in AWS Sagemaker? 如何用我自己的训练集训练文本蕴涵 model? - How to train a textual entailment model with my own training set? 使用“入口点”在 Sagemaker 中使用预先创建的 Keras 架构训练和部署模型 - Train and deploy model with pre-created Keras architecture in Sagemaker with 'entrypoint'
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM