I am trying work on bring your own model. I have R code. when i try to run the job its failing.
Training Image:
FROM r-base:3.6.3
MAINTAINER Amazon SageMaker Examples <amazon-sagemaker-examples@amazon.com>
RUN apt-get -y update && apt-get install -y --no-install-recommends \
wget \
r-base \
r-base-dev \
apt-transport-https \
ca-certificates \
python3 python3-dev pip
ENV AWS_DEFAULT_REGION="us-east-2"
RUN R -e "install.packages('reticulate', dependencies = TRUE, warning = function(w) stop(w))"
RUN R -e "install.packages('readr', dependencies = TRUE, warning = function(w) stop(w))"
RUN R -e "install.packages('dplyr', dependencies = TRUE, warning = function(w) stop(w))"
RUN pip install --quiet --no-cache-dir \
'boto3>1.0<2.0' \
'sagemaker>2.0<3.0'
ENTRYPOINT ["/usr/bin/Rscript"]
Source code:
rcode
└── train.R
└── train.tar.gz
Build
- aws s3 cp $CODEBUILD_SRC_DIR/rcode/ s3://${self:custom.deploymentBucket}/${self:service}/code/training --recursive
Serverless.com yaml
SagemakerRCodeTrainingStep:
Type: Task
Resource: ${self:custom.sageMakerTrainingJob}
Parameters:
TrainingJobName.$: "$.sageMakerTrainingJobName"
DebugHookConfig:
S3OutputPath: "s3://${self:custom.deploymentBucket}/${self:service}/models/rmodel"
AlgorithmSpecification:
TrainingImage: ${self:custom.sagemakerRExecutionContainerURI}
TrainingInputMode: "File"
OutputDataConfig:
S3OutputPath: "s3://${self:custom.deploymentBucket}/${self:service}/models/rmodel"
StoppingCondition:
MaxRuntimeInSeconds: ${self:custom.maxRuntime}
ResourceConfig:
InstanceCount: 1
InstanceType: "ml.m5.xlarge"
VolumeSizeInGB: 30
RoleArn: ${self:custom.stateMachineRoleARN}
InputDataConfig:
- DataSource:
S3DataSource:
S3DataType: "S3Prefix"
S3Uri: "s3://${self:custom.datasetsFilePath}/data/processed/train"
S3DataDistributionType: "FullyReplicated"
ChannelName: "train"
HyperParameters:
sagemaker_submit_directory: "s3://${self:custom.deploymentBucket}/${self:service}/code/training/train.tar.gz"
sagemaker_program: "train.R"
sagemaker_enable_cloudwatch_metrics: "false"
sagemaker_container_log_level: "20"
sagemaker_job_name: "sagemaker-r-learn-2022-02-28-09-56-33-234"
sagemaker_region: ${self:provider.region}
I am not sure which TrainingImage
you are using and all the files in your container. That being said, I suspect you are using a custom container.
SageMaker Training Jobs look for a train
file and run your container as follows :
docker run image train
You can change this behavior by setting the ENTRYPOINT
in your Dockerfile. Please see this example Dockerfile from the r_byo_r_algo_hpo example.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.