AWS SageMaker Canvas Model usage on Edge device in Python

Question

This way I wanted to ask a question about AWS Sagemaker. I must confess that I'm quite a newbee to the subject and therefor I was very happy with the SageMaker Canvas app. It works really easy and gives me some nice results.

First of all my model. I try to predict solar power production based on the time (dt), the AWS IoT Thingname (thingname), clouds percentage (clouds) and temperature (temp). I have a csv filled with data measured by IoT things

clouds + temp + dt + thingname => import

dt,clouds,temp,import,thingname
2022-08-30 07:45:00+02:00,1.0,0.1577,0.03,***
2022-08-30 08:00:00+02:00,1.0,0.159,0.05,***
2022-08-30 08:15:00+02:00,1.0,0.1603,0.06,***
2022-08-30 08:30:00+02:00,1.0,0.16440000000000002,0.08,***
2022-08-30 08:45:00+02:00,,,0.09,***
2022-08-30 09:00:00+02:00,1.0,0.17,0.12,***
2022-08-30 09:15:00+02:00,1.0,0.1747,0.13,***
2022-08-30 09:30:00+02:00,1.0,0.1766,0.15,***
2022-08-30 09:45:00+02:00,0.75,0.1809,0.18,***
2022-08-30 10:00:00+02:00,1.0,0.1858,0.2,***
2022-08-30 10:15:00+02:00,1.0,0.1888,0.21,***
2022-08-30 10:30:00+02:00,0.75,0.1955,0.24,***

In AWS SageMaker canvas I upload the csv and build the model. All is very easy and when I use the predict tab I upload a CSV where the import column is missing and containing API weather data for some future moment:

dt,thingname,temp,clouds
2022-09-21 10:15:00+02:00,***,0.1235,1.0
2022-09-21 10:30:00+02:00,***,0.1235,1.0
2022-09-21 10:45:00+02:00,***,0.1235,1.0
2022-09-21 11:00:00+02:00,***,0.1235,1.0
2022-09-21 11:15:00+02:00,***,0.12689999999999999,0.86
2022-09-21 11:30:00+02:00,***,0.12689999999999999,0.86
2022-09-21 11:45:00+02:00,***,0.12689999999999999,0.86
2022-09-21 12:00:00+02:00,***,0.12689999999999999,0.86
2022-09-21 12:15:00+02:00,***,0.1351,0.69
2022-09-21 12:30:00+02:00,***,0.1351,0.69
2022-09-21 12:45:00+02:00,***,0.1351,0.69

From this data SageMaker Canvas predicts some real realistic numbers, from which I assume the model is nicely build. So I want to move this model to my Greengrass Core Device to do predictions on site. I found the best model location using the sharing link to the Junyper notebook.

From reading in the AWS docs I seem to have a few options to run the model on an edge device:

Run the Greengrass SageMaker Edge component and run the model as a component and write an inference component
Run the SageMaker Edge Agent yourself
Just download the model yourself and do your thing with it on the device

Now it seems that SageMaker used XGBoost to create the model and I found the xgboost-model file and downloaded it to the device.

But here is where the trouble started: SageMaker Canvas never gives any info on what it does with the CSV to format it, so I have really no clue on how to make a prediction using the model. I get some results when I try to open the same csv file I used for the Canvas prediction, but the data is completely different and not realistic at all

# pip install xgboost==1.6.2
import xgboost as xgb

filename = f'solar-prediction-data.csv'
dpredict = xgb.DMatrix(f'{filename}?format=csv')
model = xgb.Booster()
model.load_model('xgboost-model')
result = model.predict(dpredict)
print('Prediction result::')
print(result)

I read that the column order matters, the CSV may not contain a header. But it does not get close to the SageMaker Canvas result.

I also tried using pandas :

# pip install xgboost==1.6.2
import xgboost as xgb
import pandas as pd

filename = f'solar-prediction-data.csv'
df = pd.read_csv(filename, index_col=None, header=None)

dpredict = xgb.DMatrix(df, enable_categorical=True)

model = xgb.Booster()
model.load_model('xgboost-model')
result = model.predict(dpredict, pred_interactions=True)
print('Prediction result::')
print('===============')
print(result)

But this last one always gives me following error:

ValueError: DataFrame.dtypes for data must be int, float, bool or category.  When
categorical type is supplied, DMatrix parameter `enable_categorical` must
be set to `True`. Invalid columns:dt, thingname

To be honest, I'm completely stuck and hope someone around here can give me some advice or clue on how I can proceed.

Thanks! Kind regards

Hacor

Answer 1

Hacor, Canvas autoML creates artifacts, including python feature engineering code and the feature engineering model. You can access them for the best model, under the artifact tab.

Canvas artifacts

Canvas feature engineering python code (.py file) example

Answer 2

Thanks for the reply. It was indeed a part of the puzzle. I will try to make this issue a guide to other people who want to experiment with this subject.

As far as I have got now:

You go to your model on SageMaker Canvas, and you choose share
You visit the link and move to best model -> artifacts
On this page, you download following items (Much appreciated @Danny):
Feature Engineering Model
Algorithm Model

Now you start a new python3.7 project with a virtual environment.

Copy the model.tar.gz containing the xgboost-model file to this directory (do not extract)
Copy and extract the model.tar.gz containing the code directory and the model.joblib file in the same root folder.

Now in the code directory create a requirements.txt with following contents:

sagemaker-scikit-learn-extension==2.5.0
numpy>=1.16.4
psutil
scikit-learn==0.23.2
python-dateutil==2.8.0
pandas==1.2.4
tsfresh==0.18.0
statsmodels==0.12.2

(The contents of this file are based on the sagemaker-scikit-learn-extension package and threw me errors when not implemented. Link )

Then create a file called prediction.py with following contents:

from sagemaker.xgboost import XGBoostModel
from sagemaker.local import LocalSession
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import CSVDeserializer
import csv

DUMMY_IAM_ROLE = 'arn:aws:iam::111111111111:role/service-role/AmazonSageMaker-ExecutionRole-20200101T000001'
LOCAL_SESSION = LocalSession()
LOCAL_SESSION.config={'local': {'local_code': True}} # Ensure full code locality, see: https://sagemaker.readthedocs.io/en/stable/overview.html#local-mode

def main():

    xgb_inference_model = XGBoostModel(
        model_data='file://model.tar.gz',
        role=DUMMY_IAM_ROLE,
        entry_point="sagemaker_serve.py",
        source_dir="./solar-code",
        framework_version="1.3-1",
        sagemaker_session=LOCAL_SESSION
    )

    serializer = CSVSerializer()
    deserializer = CSVDeserializer()

    print('Deploying endpoint in local mode')
    predictor = xgb_inference_model.deploy(
        initial_instance_count=1,
        instance_type="local",
        serializer=serializer
    )

    predictions = predictor.predict(['2022-09-21 10:15:00+02:00','grnrg-zoersel','0.1235','1.0'])  # type: ignore
    print("Prediction: {}".format(predictions))

    print('About to delete the endpoint to stop paying (if in cloud mode).')
    predictor.delete_endpoint(predictor.endpoint_name)

if __name__ == "__main__":
    main()

Now install requirements:

pip install -r requirements.txt

And run the code for prediction:

python prediction.py

The prediction file is based upon the XGBOOST example on the AWS examples repo. Data is adapted based on the CloudWatch logs of the Sagemaker Canvas process. It results in:

framework_version="1.3-1"
The extra requirements.txt file inside the code directory

But for now the problem is that the docker container returns an error (which I think is not an error):

Prediction: [['Received data of unknown size. Expected number of columns is 4. Number of columns in the received data is 1.']]

I think the container is throwing back the needed prediction, but it does not seem to be correct here. Any suggestions?

TODO:: Build a new container with the requirements pre-installed. All code can be found in this repo

Almost there, hope someone can make the last suggestion. And I do hope this will help others to run trained models on an edge device

Best regards

Hacor

AWS SageMaker Canvas Model usage on Edge device in Python

Question

1 answers

solution1
0 2022-09-24 00:44:46

solution2
0 2022-09-26 14:40:30

AWS SageMaker Canvas Model usage on Edge device in Python

Question

1 answers

solution1 0 2022-09-24 00:44:46

solution2 0 2022-09-26 14:40:30

solution1
0 2022-09-24 00:44:46

solution2
0 2022-09-26 14:40:30