简体   繁体   中英

AWS Sagemaker inference endpoint not utilizing all vCPUs

I have deployed a custom model on sagemaker inference endpoint (single instance) and while I was load testing, I have observed that CPU utilization metric is maxing out at 100% but according to this post it should max out at #vCPU*100 %. I have confirmed that the inference endpoint is not using all cores in clowdwatch logs.

So if one prediction call requires one second to be processed to give response, the deployed model is only able to handle one API call per second which could have been increased to 8 calls per second if all vCPUs would have been used.

Are there any settings in AWS Sagemaker deployment to use all vCPUs to increase concurrency?

Or could we use multiprocessing python package inside inference.py file while deploying such that each call comes to the default core and from there all calculations/prediction is done in any other core whichever is empty at that instance?

UPDATE


As for the question Are there any settings in AWS Sagemaker deployment to use all vCPUs to increase concurrency? There are various settings you can use For models you can set default_workers_per_model in config.properties TS_DEFAULT_WORKERS_PER_MODEL=$(nproc --all) in environment variables. Environment variables take top priority.

Other than that for each model, you can set the number of workers by using management API, but sadly it is not possible to curl to management API in sagemaker. SO TS_DEFAULT_WORKERS_PER_MODEL is the best bet. Setting this should make sure all cores are used.

But if you are using docker file then in entrypoint you can setup scripts which wait for model loading and curl to it to set number of workers

# load the model
curl -X POST localhost:8081/models?url=model_1.mar&batch_size=8&max_batch_delay=50
# after loading the model it is possible to set min_worker, etc
curl -v -X PUT http://localhost:8081/models/model_1?min_worker=1

About the other issue that logs confirm that not all cores are used, I face the same issue and believe that is a problem in the logging system. Please look at this issue https://github.com/pytorch/serve/issues/782 . The community itself agrees that if threads are not set, then by default then it prints 0, even if by default it uses 2*num_cores.

For an exhaustive set of all configs possible

# Reference: https://github.com/pytorch/serve/blob/master/docs/configuration.md
# Variables that can be configured through config.properties and Environment Variables
# NOTE: Variables which can be configured through environment variables **SHOULD** have a
# "TS_" prefix
# debug
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
model_store=/opt/ml/model
load_models=model_1.mar
# blacklist_env_vars
# default_workers_per_model
# default_response_timeout
# unregister_model_timeout
# number_of_netty_threads
# netty_client_threads
# job_queue_size
# number_of_gpu
# async_logging
# cors_allowed_origin
# cors_allowed_methods
# cors_allowed_headers
# decode_input_request
# keystore
# keystore_pass
# keystore_type
# certificate_file
# private_key_file
# max_request_size
# max_response_size
# default_service_handler
# service_envelope
# model_server_home
# snapshot_store
# prefer_direct_buffer
# allowed_urls
# install_py_dep_per_model
# metrics_format
# enable_metrics_api
# initial_worker_port

# Configuration which are not documented or enabled through environment variables

# When below variable is set true, then the variables set in environment have higher precedence.
# For example, the value of an environment variable overrides both command line arguments and a property in the configuration file. The value of a command line argument overrides a value in the configuration file.
# When set to false, environment variables are not used at all
# use_native_io=
# io_ratio=
# metric_time_interval=
enable_envvars_config=true
# model_snapshot=
# version=

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM