简体   繁体   中英

How to create a pipeline in sagemaker with pytorch

I am dealing with a classification problem with text data in sagemaker. Where, i first fit and transform it into structured format(say by using TFIDF in sklearn) then i kept the result in S3 bucket and i used it for training my pytorch model for which i have written the code in my entry point.

if we notice, by the end of the above process, i have two models

  1. sklearn TFIDF model
  2. actual PyTorch model

So, when every time i need to predict on a new text data, i need to separately process(transform) the text data with TFIDF model which i created during my training.

How can i create a pipeline in sagemaker with sklearn's TFIDF and pytorch models.

if i fit and transform text data using TFIDF in my main method in entrypoint then if i train my pytorch model in my main method, i can return only one model which will be used in model_fn()

First, checkout the mnist example here:

https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/pytorch_mnist/pytorch_mnist.ipynb

With script mode, you can run the code (in mnist.py) using the below estimator.

from sagemaker.pytorch import PyTorch

estimator = PyTorch(entry_point='mnist.py',
                    role=role,
                    framework_version='1.1.0',
                    train_instance_count=2,
                    train_instance_type='ml.c4.xlarge',
                    hyperparameters={
                        'epochs': 6,
                        'backend': 'gloo'
                    })

Simply update the mnist.py script as per tfidf pipeline. Hope this helps.

Apparently, We need to use inference pipelines.

An inference pipeline is an Amazon SageMaker model that is composed of a linear sequence of two to five containers that process requests for inferences on data . You use an inference pipeline to define and deploy any combination of pretrained Amazon SageMaker built-in algorithms and your own custom algorithms packaged in Docker containers. You can use an inference pipeline to combine preprocessing, predictions, and post-processing data science tasks. Inference pipelines are fully managed.

one can read the docs here -

https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipelines.html

Example -

https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_inference_pipeline/Inference%20Pipeline%20with%20Scikit-learn%20and%20Linear%20Learner.ipynb

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM