简体   繁体   English

在 AWS Sagemaker 中训练 keras model

[英]Training keras model in AWS Sagemaker

I have keras training script on my machine.我的机器上有 keras 训练脚本。 I am experimenting to run my script on AWS sagemaker container.我正在尝试在 AWS sagemaker 容器上运行我的脚本。 For that I have used below code.为此,我使用了以下代码。

from sagemaker.tensorflow import TensorFlow
est = TensorFlow(
    entry_point="caller.py",
    source_dir="./",
    role='role_arn',
    framework_version="2.3.1",
    py_version="py37",
    instance_type='ml.m5.large',
    instance_count=1,
    hyperparameters={'batch': 8, 'epochs': 10},
)

est.fit()

here caller.py is my entry point.这里caller.py是我的入口点。 After executing the above code I am getting keras is not installed .执行上述代码后,我得到keras is not installed Here is the stacktrace.这是堆栈跟踪。

Traceback (most recent call last):
  File "executor.py", line 14, in <module>
    est.fit()
  File "/home/thasin/Documents/python/venv/lib/python3.8/site-packages/sagemaker/estimator.py", line 682, in fit
    self.latest_training_job.wait(logs=logs)
  File "/home/thasin/Documents/python/venv/lib/python3.8/site-packages/sagemaker/estimator.py", line 1625, in wait
    self.sagemaker_session.logs_for_job(self.job_name, wait=True, log_type=logs)
  File "/home/thasin/Documents/python/venv/lib/python3.8/site-packages/sagemaker/session.py", line 3681, in logs_for_job
    self._check_job_status(job_name, description, "TrainingJobStatus")
  File "/home/thasin/Documents/python/venv/lib/python3.8/site-packages/sagemaker/session.py", line 3240, in _check_job_status
    raise exceptions.UnexpectedStatusException(
sagemaker.exceptions.UnexpectedStatusException: Error for Training job tensorflow-training-2021-06-09-07-14-01-778: Failed. Reason: AlgorithmError: ExecuteUserScriptError:
Command "/usr/local/bin/python3.7 caller.py --batch 4 --epochs 10

ModuleNotFoundError: No module named 'keras'

  1. Which instance has pre-installed keras?哪个实例预装了keras?
  2. Is there any way I can install the python package to the AWS container?有什么方法可以将 python package 安装到 AWS 容器中? or any workaround for the issue?或该问题的任何解决方法?

Note: I have tried with my own container uploading to ECR and successfully run my code.注意:我已经尝试将自己的容器上传到 ECR 并成功运行我的代码。 I am looking for AWS's existing container capability.我正在寻找 AWS 现有的容器功能。

Keras is now part of tensorflow, so you can just reformat your code to use tf.keras instead of keras . Keras 现在是 tensorflow 的一部分,因此您只需重新格式化代码以使用tf.keras而不是keras Since version 2.3.0 of tensorflow they are in sync, so it should not be that difficult.由于 tensorflow 的 2.3.0 版本是同步的,所以应该没那么难。 You container is this , as you can see from the list of the packages, there is no Keras .你的容器就是这个,从包列表中可以看出,没有Keras If you instead want to extend a pre-built container you can take a look here but I don't recommend in this specific use-case, because also for future code maintainability you should go for tf.keras如果您想扩展预构建的容器,您可以查看此处,但我不建议在此特定用例中使用,因为为了将来的代码可维护性,您应该为 tf.keras 提供tf.keras

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 AWS Sagemaker 中训练多个模型 - Training multiple model in AWS Sagemaker AWS Sagemaker T5 或 huggingface Model 培训问题 - AWS Sagemaker T5 or huggingface Model training issue 更改 AWS SageMaker 训练作业上的 model 文件保存位置 - Change model file save location on AWS SageMaker Training Job AWS Sagemaker - 自定义培训作业不保存 Model output - AWS Sagemaker - Custom Training Job not saving Model output 如何在AWS SageMaker中加载训练集以构建模型? - How to load a training set in AWS SageMaker to build a model? AWS Sagemaker 多项训练作业 - AWS Sagemaker Multiple Training Jobs 在 AWS Sagemaker 中训练 scikit 学习模型时无法创建 model.tar.gz 文件 - Couldn't create model.tar.gz file while training scikit learn model in AWS Sagemaker 我在 AWS SageMaker 中训练 model 时遇到问题,在需要保存 model 之前一切都很好 - I've had trouble training a model in AWS SageMaker, everything is fine until the model needs to be saved AWS SageMaker-在本地培训,但要部署到AWS? - AWS SageMaker - training locally but deploying to AWS? 如何通过 AWS Lambda ZC1C425268E17985D1AB5074 对 AWS SageMaker 上托管的 keras model 进行推断? - How to make inference to a keras model hosted on AWS SageMaker via AWS Lambda function?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM