AWS Studio ModuleNotFoundError: No module named 'sagemaker'

Question

I am trying to replicate the below example for churn prediction. https://towardsdatascience.com/a-practical-guide-to-mlops-in-aws-sagemaker-part-i-1d28003f565

Preprocessing.py has to import sagemaker but it's throwing ModuleNotFoundError as I run the pipeline. Same sagemaker package is also imported in pipeline.py but it works fine there. Please let me know how we can install packages in studio environment with the syntax. I tried with pip and conda install in a cell in another ipynb file.. Requirement already satisfied message is only displayed when it gets installed.

Answer 1

So probably the first thing to understand here is that the steps in a SageMaker pipeline don't actually run inside of SageMaker Studio, but in containerized jobs.

What I think you're seeing is that the SageMaker Python SDK (which is open-source and published on PyPI as sagemaker ) is present in your Studio notebook kernel where you set up the pipeline, but missing from the processing job that runs preprocessing.py .

I see the pipeline uses a ScriptProcessor based on the XGBoost v1.0-1 image ( image_uri and script_eval in pipeline.py ), so it looks like this particular image doesn't have sagemaker installed by default.

In fact, preprocessing.py only seems to be using the library for the purpose of looking up the name of the SageMaker default bucket. You could achieve the same result with only boto3 (which should already be installed) as follows:

account_id = boto3.client("sts").get_caller_identity()["Account"]
region = boto3.Session().region_name

trans_bucket = f"sagemaker-{region}-{account_id}"

If you really needed to install extra libraries to use with your processing jobs, I would suggest to check out FrameworkProcessor (which could install sagemaker via you providing a requirements.txt file) instead of ScriptProcessor - but watch out that there have been some bug reports when using FrameworkProcessor and Pipelines together .

If FrameworkProcessor isn't working, you could instead build your own container image FROM the pre-provided one and pip install sagemaker in the Dockerfile. You would upload this customized image to Amazon ECR and then reference it in your pipeline instead of the standard XGBoost one.

AWS Studio ModuleNotFoundError: No module named 'sagemaker'

Question

1 answers

solution1
0 2022-06-13 06:15:34

AWS Studio ModuleNotFoundError: No module named 'sagemaker'

Question

1 answers

solution1 0 2022-06-13 06:15:34

solution1
0 2022-06-13 06:15:34