I am trying to replicate the below example for churn prediction. https://towardsdatascience.com/a-practical-guide-to-mlops-in-aws-sagemaker-part-i-1d28003f565
Preprocessing.py has to import sagemaker but it's throwing ModuleNotFoundError as I run the pipeline. Same sagemaker package is also imported in pipeline.py but it works fine there. Please let me know how we can install packages in studio environment with the syntax. I tried with pip and conda install in a cell in another ipynb file.. Requirement already satisfied message is only displayed when it gets installed.
So probably the first thing to understand here is that the steps in a SageMaker pipeline don't actually run inside of SageMaker Studio, but in containerized jobs.
What I think you're seeing is that the SageMaker Python SDK (which is open-source and published on PyPI as sagemaker
) is present in your Studio notebook kernel where you set up the pipeline, but missing from the processing job that runs preprocessing.py
.
I see the pipeline uses a ScriptProcessor
based on the XGBoost v1.0-1 image ( image_uri
and script_eval
in pipeline.py
), so it looks like this particular image doesn't have sagemaker
installed by default.
In fact, preprocessing.py
only seems to be using the library for the purpose of looking up the name of the SageMaker default bucket. You could achieve the same result with only boto3 (which should already be installed) as follows:
account_id = boto3.client("sts").get_caller_identity()["Account"]
region = boto3.Session().region_name
trans_bucket = f"sagemaker-{region}-{account_id}"
If you really needed to install extra libraries to use with your processing jobs, I would suggest to check out FrameworkProcessor (which could install sagemaker via you providing a requirements.txt
file) instead of ScriptProcessor - but watch out that there have been some bug reports when using FrameworkProcessor and Pipelines together .
If FrameworkProcessor isn't working, you could instead build your own container image FROM
the pre-provided one and pip install sagemaker
in the Dockerfile. You would upload this customized image to Amazon ECR and then reference it in your pipeline instead of the standard XGBoost one.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.