This way I wanted to ask a question about AWS Sagemaker. I must confess that I'm quite a newbee to the subject and therefor I was very happy with the SageMaker Canvas app. It works really easy and gives me some nice results.
First of all my model. I try to predict solar power production based on the time (dt), the AWS IoT Thingname (thingname), clouds percentage (clouds) and temperature (temp). I have a csv filled with data measured by IoT things
clouds
+ temp
+ dt
+ thingname
=> import
dt,clouds,temp,import,thingname
2022-08-30 07:45:00+02:00,1.0,0.1577,0.03,***
2022-08-30 08:00:00+02:00,1.0,0.159,0.05,***
2022-08-30 08:15:00+02:00,1.0,0.1603,0.06,***
2022-08-30 08:30:00+02:00,1.0,0.16440000000000002,0.08,***
2022-08-30 08:45:00+02:00,,,0.09,***
2022-08-30 09:00:00+02:00,1.0,0.17,0.12,***
2022-08-30 09:15:00+02:00,1.0,0.1747,0.13,***
2022-08-30 09:30:00+02:00,1.0,0.1766,0.15,***
2022-08-30 09:45:00+02:00,0.75,0.1809,0.18,***
2022-08-30 10:00:00+02:00,1.0,0.1858,0.2,***
2022-08-30 10:15:00+02:00,1.0,0.1888,0.21,***
2022-08-30 10:30:00+02:00,0.75,0.1955,0.24,***
In AWS SageMaker canvas I upload the csv and build the model. All is very easy and when I use the predict tab I upload a CSV where the import column is missing and containing API weather data for some future moment:
dt,thingname,temp,clouds
2022-09-21 10:15:00+02:00,***,0.1235,1.0
2022-09-21 10:30:00+02:00,***,0.1235,1.0
2022-09-21 10:45:00+02:00,***,0.1235,1.0
2022-09-21 11:00:00+02:00,***,0.1235,1.0
2022-09-21 11:15:00+02:00,***,0.12689999999999999,0.86
2022-09-21 11:30:00+02:00,***,0.12689999999999999,0.86
2022-09-21 11:45:00+02:00,***,0.12689999999999999,0.86
2022-09-21 12:00:00+02:00,***,0.12689999999999999,0.86
2022-09-21 12:15:00+02:00,***,0.1351,0.69
2022-09-21 12:30:00+02:00,***,0.1351,0.69
2022-09-21 12:45:00+02:00,***,0.1351,0.69
From this data SageMaker Canvas predicts some real realistic numbers, from which I assume the model is nicely build. So I want to move this model to my Greengrass Core Device to do predictions on site. I found the best model location using the sharing link to the Junyper notebook.
From reading in the AWS docs I seem to have a few options to run the model on an edge device:
Now it seems that SageMaker used XGBoost to create the model and I found the xgboost-model
file and downloaded it to the device.
But here is where the trouble started: SageMaker Canvas never gives any info on what it does with the CSV to format it, so I have really no clue on how to make a prediction using the model. I get some results when I try to open the same csv file I used for the Canvas prediction, but the data is completely different and not realistic at all
# pip install xgboost==1.6.2
import xgboost as xgb
filename = f'solar-prediction-data.csv'
dpredict = xgb.DMatrix(f'{filename}?format=csv')
model = xgb.Booster()
model.load_model('xgboost-model')
result = model.predict(dpredict)
print('Prediction result::')
print(result)
I read that the column order matters, the CSV may not contain a header. But it does not get close to the SageMaker Canvas result.
I also tried using pandas
:
# pip install xgboost==1.6.2
import xgboost as xgb
import pandas as pd
filename = f'solar-prediction-data.csv'
df = pd.read_csv(filename, index_col=None, header=None)
dpredict = xgb.DMatrix(df, enable_categorical=True)
model = xgb.Booster()
model.load_model('xgboost-model')
result = model.predict(dpredict, pred_interactions=True)
print('Prediction result::')
print('===============')
print(result)
But this last one always gives me following error:
ValueError: DataFrame.dtypes for data must be int, float, bool or category. When
categorical type is supplied, DMatrix parameter `enable_categorical` must
be set to `True`. Invalid columns:dt, thingname
To be honest, I'm completely stuck and hope someone around here can give me some advice or clue on how I can proceed.
Thanks! Kind regards
Hacor
Hacor, Canvas autoML creates artifacts, including python feature engineering code and the feature engineering model. You can access them for the best model, under the artifact tab.
Thanks for the reply. It was indeed a part of the puzzle. I will try to make this issue a guide to other people who want to experiment with this subject.
As far as I have got now:
You go to your model on SageMaker Canvas, and you choose share
You visit the link and move to best model -> artifacts
On this page, you download following items (Much appreciated @Danny):
Feature Engineering Model
Algorithm Model
Now you start a new python3.7 project with a virtual environment.
xgboost-model
file to this directory (do not extract)code
directory and the model.joblib
file in the same root folder. Now in the code
directory create a requirements.txt with following contents:
sagemaker-scikit-learn-extension==2.5.0
numpy>=1.16.4
psutil
scikit-learn==0.23.2
python-dateutil==2.8.0
pandas==1.2.4
tsfresh==0.18.0
statsmodels==0.12.2
(The contents of this file are based on the sagemaker-scikit-learn-extension
package and threw me errors when not implemented. Link )
Then create a file called prediction.py
with following contents:
from sagemaker.xgboost import XGBoostModel
from sagemaker.local import LocalSession
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import CSVDeserializer
import csv
DUMMY_IAM_ROLE = 'arn:aws:iam::111111111111:role/service-role/AmazonSageMaker-ExecutionRole-20200101T000001'
LOCAL_SESSION = LocalSession()
LOCAL_SESSION.config={'local': {'local_code': True}} # Ensure full code locality, see: https://sagemaker.readthedocs.io/en/stable/overview.html#local-mode
def main():
xgb_inference_model = XGBoostModel(
model_data='file://model.tar.gz',
role=DUMMY_IAM_ROLE,
entry_point="sagemaker_serve.py",
source_dir="./solar-code",
framework_version="1.3-1",
sagemaker_session=LOCAL_SESSION
)
serializer = CSVSerializer()
deserializer = CSVDeserializer()
print('Deploying endpoint in local mode')
predictor = xgb_inference_model.deploy(
initial_instance_count=1,
instance_type="local",
serializer=serializer
)
predictions = predictor.predict(['2022-09-21 10:15:00+02:00','grnrg-zoersel','0.1235','1.0']) # type: ignore
print("Prediction: {}".format(predictions))
print('About to delete the endpoint to stop paying (if in cloud mode).')
predictor.delete_endpoint(predictor.endpoint_name)
if __name__ == "__main__":
main()
Now install requirements:
pip install -r requirements.txt
And run the code for prediction:
python prediction.py
The prediction file is based upon the XGBOOST example on the AWS examples repo. Data is adapted based on the CloudWatch logs of the Sagemaker Canvas process. It results in:
framework_version="1.3-1"
requirements.txt
file inside the code
directory But for now the problem is that the docker container returns an error (which I think is not an error):
Prediction: [['Received data of unknown size. Expected number of columns is 4. Number of columns in the received data is 1.']]
I think the container is throwing back the needed prediction, but it does not seem to be correct here. Any suggestions?
TODO:: Build a new container with the requirements pre-installed. All code can be found in this repo
Almost there, hope someone can make the last suggestion. And I do hope this will help others to run trained models on an edge device
Best regards
Hacor
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.