简体繁体 English

在 AWS Sagemaker 中训练多个模型

[英]Training multiple model in AWS Sagemaker

原文 2020-03-18 03:46:08 4 1 amazon-web-services/ amazon-sagemaker

Can I train multiple model in AWS Sagemaker by evaluating the models is train.py script and also how to get back multiple metrics from multiple models?我可以通过评估模型在 AWS Sagemaker 中训练多个模型是 train.py 脚本以及如何从多个模型中取回多个指标吗？

Any links, docs or videos would be useful.任何链接、文档或视频都会很有用。

1 个解决方案

Yes, what you write in a sagemaker training script (assuming you use something that lets you pass custom code like your own container or a framework container) is flexible, and does not need to be just one model or even ML.是的，您在 sagemaker 培训脚本中编写的内容（假设您使用的东西可以让您传递自定义代码，例如您自己的容器或框架容器）是灵活的，并且不需要只是一个模型甚至 ML。 You can definitely write multiple model trainings in a single container, and pull all related metrics using SageMaker metric capture via regex, see an example regex here with the Sklearn random forest .您绝对可以在单个容器中编写多个模型训练，并通过正则表达式使用 SageMaker 指标捕获提取所有相关指标，请参阅此处使用Sklearn 随机森林的示例正则表达式。 That being said, it is often a better idea to separate things and have one model per SageMaker job , because of the following reasons among other:话虽如此，由于以下原因，将事物分开并为每个 SageMaker 作业使用一个模型通常是一个更好的主意：

It allows you to separate model metadata and metrics and compare them easily with the SageMaker metadata service它允许您分离模型元数据和指标，并轻松地与SageMaker 元数据服务进行比较
It allows you to specialize hardware to each model and get better economics.它允许您将硬件专门用于每个模型并获得更好的经济性。 Each model has its own sweet spot when it comes to CPU, GPU, RAM每个型号在 CPU、GPU、RAM 方面都有自己的最佳位置
It allows you to use the exact same container for single training but also for bayesian hyperparameter search , an method that can be both faster and cheaper than regular gridsearch.它允许您使用完全相同的容器进行单次训练，也可以用于贝叶斯超参数搜索，这种方法比常规网格搜索更快、更便宜。