简体   繁体   中英

Should I run forecast predictive model with AWS lambda or sagemaker?

I've been reading some articles regarding this topic and have preliminary thoughts as what I should do with it, but still want to see if anyone can share comments if you have more experience with running machine learning on AWS. I was doing a project for a professor at school, and we decided to use AWS. I need to find a cost-effective and efficient way to deploy a forecasting model on it.

What we want to achieve is:

  • read the data from S3 bucket monthly (there will be new data coming in every month),
  • run a few python files (.py) for custom-built packages and install dependencies (including the files, no more than 30kb),
  • produce predicted results into a file back in S3 (JSON or CSV works), or push to other endpoints (most likely to be some BI tools - tableau etc.) - but really this step can be flexible (not web for sure)

First thought I have is AWS sagemaker . However, we'll be using "fb prophet" model to predict the results, and we built a customized package to use in the model, therefore, I don't think the notebook instance is gonna help us. (Please correct me if I'm wrong) My understanding is that sagemaker is a environment to build and train the model, but we already built and trained the model. Plus, we won't be using AWS pre-built models anyways.

Another thing is if we want to use custom-built package, we will need to create container image, and I've never done that before, not sure about the efforts to do that.

2nd option is to create multiple lambda functions

  • one that triggers to run the python scripts from S3 bucket (2-3.py files) every time a new file is imported into S3 bucket, which will happen monthly.

  • one that trigger after the python scripts are done running and produce results and save into S3 bucket.

3rd option will combine both options: - Use lambda function to trigger the implementation on the python scripts in S3 bucket when the new file comes in. - Push the result using sagemaker endpoint, which means we host the model on sagemaker and deploy from there.

I am still not entirely sure how to put pre-built model and python scripts onto sagemaker instance and host from there.

I'm hoping whoever has more experience with AWS service can help give me some guidance, in terms of more cost-effective and efficient way to run model.

Thank you!!

I would say it all depends on how heavy your model is / how much data you're running through it. You're right to identify that Lambda will likely be less work. It's quite easy to get a lambda up and running to do the things that you need, and Lambda has a very generous free tier . The problem is:

  1. Lambda functions are fundamentally limited in their processing capacity (they timeout after max 15 minutes).

  2. Your model might be expensive to load.

If you have a lot of data to run through your model, you will need multiple lambdas. Multiple lambdas means you have to load your model multiple times, and that's wasted work. If you're working with "big data" this will get expensive once you get through the free tier.

If you don't have much data, Lambda will work just fine. I would eyeball it as follows: assuming your data processing step is dominated by your model step, and if all your model interactions (loading the model + evaluating all your data) take less than 15min, you're definitely fine. If they take more, you'll need to do a back-of-the-envelope calculation to figure out whether you'd leave the Lambda free tier.

Regarding Lambda: You can literally copy-paste code in to setup a prototype. If your execution takes more than 15min for all your data, you'll need a method of splitting your data up between multiple Lambdas. Consider Step Functions for this.

SageMaker is a set of services that each is responsible for a different part of the Machine Learning process. What you might want to use is the hosted version of Jupyter notebooks in SageMaker. You get a lot of freedom in the size of the instance that you are using (CPU/GPU, memory, and disk), and you can install various packages on that instance (such as FB Prophet). If you need it once a month, you can stop and start the notebook instances between these times and "Run all" the cells in your notebooks on this instance. It will only cost you the minutes of execution.

regarding the other alternatives, it is not trivial to run FB Prophet in Lambda due to the size limit of the libraries that you can install on Lambda (to avoid too long cold start). You can also use ECS (container Service) where you can have much larger images, but you need to know how to build a Docker image of your code and endpoint to be able to call it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM