简体繁体 English

如何在AWS SageMaker中加载训练集以构建模型？

[英]How to load a training set in AWS SageMaker to build a model?

原文 2018-03-08 08:33:40 0 1 amazon-web-services/ amazon-sagemaker

I am very new to SageMaker. 我是SageMaker的新手。 Upon my first interaction, it looks like the AWS SageMaker requires you to start from its Notebook. 第一次互动时，AWS SageMaker似乎要求您从其笔记本开始。 I have a training set which is ready. 我有一个准备好的训练集。 Is there a way to bypass setting the Notebook and just to start by upload the training set? 有没有一种方法可以绕过设置笔记本而只是通过上传培训集开始？ Or it should be done through the Notebook. 或应通过笔记本电脑完成。 If anyone knows some example fitting my need above, that will be great. 如果有人在上面知道一些适合我需求的示例，那将很棒。

1 个解决方案

Amazon SageMaker is a combination of multiple services that each is independent of the others. Amazon SageMaker是多个服务的组合，每个服务都彼此独立。 You can use the notebook instances if you want to develop your models in the familiar Jupyter environment. 如果要在熟悉的Jupyter环境中开发模型，可以使用笔记本实例。 But if just need to train a model, you can use the training jobs without opening a notebook instance. 但是，如果只需要训练模型，则可以使用训练作业而无需打开笔记本实例。

There a few ways to launch a training job: 有几种方法可以启动培训工作：

Use the high-level SDK for Python that is similar to the way that you start a training step in your python code 使用适用于Python的高级SDK，类似于在python代码中开始训练步骤的方式

kmeans.fit(kmeans.record_set(train_set[0]))

Here is the link to the python library: https://github.com/aws/sagemaker-python-sdk 这是python库的链接： https : //github.com/aws/sagemaker-python-sdk

Use the low-level API to Create-Training-Job, and you can do that using various SDK (Java, Python, JavaScript, C#...) or the CLI. 使用低级API创建培训工作，您可以使用各种SDK（Java，Python，JavaScript，C＃...）或CLI进行操作。

sagemaker = boto3.client('sagemaker') sagemaker.create_training_job(**create_training_params)

Here is a link to the documentation on these options: https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-train-model-create-training-job.html 这是这些选项的文档链接： https : //docs.aws.amazon.com/sagemaker/latest/dg/ex1-train-model-create-training-job.html

Use Spark interface to launch it using a similar interface to creating an MLLib training job 使用Spark界面通过与创建MLLib培训工作类似的界面启动它

val estimator = new KMeansSageMakerEstimator( sagemakerRole = IAMRole(roleArn), trainingInstanceType = "ml.p2.xlarge", trainingInstanceCount = 1, endpointInstanceType = "ml.c4.xlarge", endpointInitialInstanceCount = 1) .setK(10).setFeatureDim(784)

val model = estimator.fit(trainingData)

Here is a link to the spark-sagemaker library: https://github.com/aws/sagemaker-spark 这是spark-sagemaker库的链接： https : //github.com/aws/sagemaker-spark

Create a training job in the Amazon SageMaker console using the wizard there: https://console.aws.amazon.com/sagemaker/home?region=us-east-1#/jobs 使用以下向导在Amazon SageMaker控制台中创建培训作业： https ://console.aws.amazon.com/sagemaker/home?region=us-east-1#/jobs

Please note that there a few options also to train models, either using the built-in algorithms such as K-Means, Linear Learner or XGBoost (see here for the complete list: https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html ). 请注意，还有一些选项可以使用K-Means，Linear Learner或XGBoost等内置算法来训练模型（完整列表请参见此处： https ：//docs.aws.amazon.com/sagemaker /latest/dg/algos.html ）。 But you can also bring your own models for pre-baked Docker images such as TensorFlow ( https://docs.aws.amazon.com/sagemaker/latest/dg/tf.html ) or MXNet ( https://docs.aws.amazon.com/sagemaker/latest/dg/mxnet.html ), your own Docker image ( https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html ). 但是您还可以为预烘焙的Docker映像提供自己的模型，例如TensorFlow（ https://docs.aws.amazon.com/sagemaker/latest/dg/tf.html ）或MXNet（ https://docs.aws .amazon.com / sagemaker / latest / dg / mxnet.html ），即您自己的Docker映像（ https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html ）。