[英]Sagemaker deploy model with inference code and requirements
I trained a TensorFlow model and now I would like to deploy it.我训练了一个 TensorFlow 模型,现在我想部署它。 The data needs to be processed thus I have to specify one inference.py script and one requirements.txt file.数据需要处理,因此我必须指定一个 inference.py 脚本和一个 requirements.txt 文件。 When I deploy the model it gives the following error:当我部署模型时,它会出现以下错误:
Failed Reason: The primary container for production variant All Traffic did not pass the ping health check. Please check CloudWatch logs for this endpoint.
I am not using any VPC and when i try to download a python package from the notebook instance it works without any error.我没有使用任何 VPC,当我尝试从笔记本实例下载 python 包时,它可以正常工作,没有任何错误。 There is a problem with the connection and it can't install the dependecies apparently.连接有问题,显然无法安装依赖项。 What can I do?我能做什么?
INFO:__main__:PYTHON SERVICE: True
INFO:__main__:starting services
INFO:__main__:using default model name: model
INFO:__main__:tensorflow serving model config:
model_config_list: {
config: {
name: 'model'
base_path: '/opt/ml/model'
model_platform: 'tensorflow'
model_version_policy: {
specific: {
versions: 1
}
}
}
}
INFO:__main__:tensorflow version info:
2021-07-15 14:48:01.085492: W external/org_tensorflow/tensorflow/core/profiler/internal/smprofiler_timeline.cc:460] Initializing the SageMaker Profiler.
2021-07-15 14:48:01.087774: W external/org_tensorflow/tensorflow/core/profiler/internal/smprofiler_timeline.cc:105] SageMaker Profiler is not enabled. The timeline writer thread will not be started, future recorded events will be dropped.
TensorFlow ModelServer: 2.4.0-rc4+dev.sha.no_git
TensorFlow Library: 2.4.1
INFO:__main__:tensorflow serving command: tensorflow_model_server --port=15000 --rest_api_port=15001 --model_config_file=/sagemaker/model-config.cfg --max_num_load_retries=0
INFO:__main__:started tensorflow serving (pid: 17)
INFO:tfs_utils:Trying to connect with model server: http://localhost:15001/v1/models/model
WARNING:urllib3.connectionpool:Retrying (Retry(total=8, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0400b14d90>: Failed to establish a new connection: [Errno 111] Connection refused')': /v1/models/model
2021-07-15 14:48:01.589503: W external/org_tensorflow/tensorflow/core/profiler/internal/smprofiler_timeline.cc:460] Initializing the SageMaker Profiler.
2021-07-15 14:48:01.589629: W external/org_tensorflow/tensorflow/core/profiler/internal/smprofiler_timeline.cc:105] SageMaker Profiler is not enabled. The timeline writer thread will not be started, future recorded events will be dropped.
2021-07-15 14:48:01.596877: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
2021-07-15 14:48:01.596910: I tensorflow_serving/model_servers/server_core.cc:587] (Re-)adding model: model
2021-07-15 14:48:01.698159: I tensorflow_serving/util/retrier.cc:46] Retrying of Reserving resources for servable: {name: model version: 1} exhausted max_num_retries: 0
2021-07-15 14:48:01.698222: I tensorflow_serving/core/basic_manager.cc:740] Successfully reserved resources to load servable {name: model version: 1}
2021-07-15 14:48:01.698242: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: model version: 1}
2021-07-15 14:48:01.698259: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: model version: 1}
2021-07-15 14:48:01.698325: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:32] Reading SavedModel from: /opt/ml/model/000000001
2021-07-15 14:48:01.716135: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:55] Reading meta graph with tags { serve }
2021-07-15 14:48:01.716197: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:93] Reading SavedModel debug info (if present) from: /opt/ml/model/000000001
2021-07-15 14:48:01.720153: I external/org_tensorflow/tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 4. Tune using inter_op_parallelism_threads for best performance.
WARNING:urllib3.connectionpool:Retrying (Retry(total=7, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0400b29450>: Failed to establish a new connection: [Errno 111] Connection refused')': /v1/models/model
2021-07-15 14:48:01.825477: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle.
2021-07-15 14:48:01.833263: I external/org_tensorflow/tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2300060000 Hz
2021-07-15 14:48:01.971809: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:190] Running initialization op on SavedModel bundle at path: /opt/ml/model/000000001
2021-07-15 14:48:01.989851: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 291516 microseconds.
2021-07-15 14:48:01.992763: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:59] No warmup data file found at /opt/ml/model/000000001/assets.extra/tf_serving_warmup_requests
2021-07-15 14:48:01.994208: I tensorflow_serving/util/retrier.cc:46] Retrying of Loading servable: {name: model version: 1} exhausted max_num_retries: 0
2021-07-15 14:48:01.994232: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: model version: 1}
2021-07-15 14:48:02.003695: I tensorflow_serving/model_servers/server.cc:371] Running gRPC ModelServer at 0.0.0.0:15000 ...
[warn] getaddrinfo: address family for nodename not supported
2021-07-15 14:48:02.006269: I tensorflow_serving/model_servers/server.cc:391] Exporting HTTP/REST API at:localhost:15001 ...
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
WARNING:urllib3.connectionpool:Retrying (Retry(total=6, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0400b29a10>: Failed to establish a new connection: [Errno 111] Connection refused')': /v1/models/model
INFO:tfs_utils:<Response [200]>
INFO:tfs_utils:model: http://localhost:15001/v1/models/model is available now
INFO:__main__:nginx config:
load_module modules/ngx_http_js_module.so;
worker_processes auto;
daemon off;
pid /tmp/nginx.pid;
error_log /dev/stderr error;
worker_rlimit_nofile 4096;
events {
worker_connections 2048;
}
http {
include /etc/nginx/mime.types;
default_type application/json;
access_log /dev/stdout combined;
js_include tensorflow-serving.js;
upstream tfs_upstream {
server localhost:15001;
}
upstream gunicorn_upstream {
server unix:/tmp/gunicorn.sock fail_timeout=1;
}
server {
listen 8080 deferred;
client_max_body_size 0;
client_body_buffer_size 100m;
subrequest_output_buffer_size 100m;
set $tfs_version 2.4;
set $default_tfs_model model;
location /tfs {
rewrite ^/tfs/(.*) /$1 break;
proxy_redirect off;
proxy_pass_request_headers off;
proxy_set_header Content-Type 'application/json';
proxy_set_header Accept 'application/json';
proxy_pass http://tfs_upstream;
}
location /ping {
proxy_pass http://gunicorn_upstream/ping;
}
location /invocations {
proxy_pass http://gunicorn_upstream/invocations;
}
location /models {
proxy_pass http://gunicorn_upstream/models;
}
location / {
return 404 '{"error": "Not Found"}';
}
keepalive_timeout 3;
}
}
INFO:__main__:gunicorn command: gunicorn -b unix:/tmp/gunicorn.sock -k gevent --chdir /sagemaker --workers 1 --threads 1 --pythonpath /opt/ml/model/code,/opt/ml/model/code/lib -e TFS_GRPC_PORT_RANGE=15000-15002 -e TFS_REST_PORT_RANGE=15001-15003 -e SAGEMAKER_MULTI_MODEL=False -e SAGEMAKER_SAFE_PORT_RANGE=15000-15999 -e SAGEMAKER_TFS_WAIT_TIME_SECONDS=300 python_service:app
INFO:__main__:gunicorn version info:
gunicorn (version 20.0.4)
INFO:__main__:started gunicorn (pid: 72)
[2021-07-15 14:48:02 +0000] [72] [INFO] Starting gunicorn 20.0.4
[2021-07-15 14:48:02 +0000] [72] [INFO] Listening at: unix:/tmp/gunicorn.sock (72)
INFO:__main__:gunicorn server is ready!
[2021-07-15 14:48:02 +0000] [72] [INFO] Using worker: gevent
[2021-07-15 14:48:02 +0000] [76] [INFO] Booting worker with pid: 76
INFO:__main__:nginx version info:
nginx version: nginx/1.20.0
built by gcc 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)
built with OpenSSL 1.1.1 11 Sep 2018
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-g -O2 -fdebug-prefix-map=/data/builder/debuild/nginx-1.20.0/debian/debuild-base/nginx-1.20.0=. -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie'
INFO:__main__:started nginx (pid: 77)
INFO:python_service:Creating grpc channel for port: 15000
[2021-07-15 14:48:03 +0000] [76] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/ggevent.py", line 162, in init_process
super().init_process()
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 119, in init_process
self.load_wsgi()
File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/base.py", line 144, in load_wsgi
self.wsgi = self.app.wsgi()
File "/usr/local/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/usr/local/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 49, in load
return self.load_wsgiapp()
File "/usr/local/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 39, in load_wsgiapp
return util.import_app(self.app_uri)
File "/usr/local/lib/python3.7/site-packages/gunicorn/util.py", line 358, in import_app
mod = importlib.import_module(module)
File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/sagemaker/python_service.py", line 414, in <module>
resources = ServiceResources()
File "/sagemaker/python_service.py", line 400, in __init__
self._python_service_resource = PythonServiceResource()
File "/sagemaker/python_service.py", line 83, in __init__
self._handler, self._input_handler, self._output_handler = self._import_handlers()
File "/sagemaker/python_service.py", line 278, in _import_handlers
spec.loader.exec_module(inference)
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/ml/model/code/inference.py", line 1, in <module>
import librosa
ModuleNotFoundError: No module named 'librosa'
[2021-07-15 14:48:03 +0000] [76] [INFO] Worker exiting (pid: 76)
[2021-07-15 14:48:03 +0000] [72] [INFO] Shutting down: Master
[2021-07-15 14:48:03 +0000] [72] [INFO] Reason: Worker failed to boot.
The code I used is this:我使用的代码是这样的:
from sagemaker.tensorflow.serving import Model
model = Model(entry_point='inference.py',
dependencies=['requirements.txt'],
model_data=bucket,
role=role,
sagemaker_session=sagemaker_session,
framework_version='2.4.1')
predictor = model.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")
Since you have already trained your model outside of SageMaker you want to focus on just deployment/inference.由于您已经在 SageMaker 之外训练了模型,因此您只想专注于部署/推理。 Thus, you want to store your model artifacts in S3 in a tar.gz format.因此,您希望以 tar.gz 格式将模型工件存储在 S3 中。 The correct api call that you want to be working with is the following code block.您要使用的正确 api 调用是以下代码块。
from sagemaker.tensorflow import TensorFlowModel
model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole')
predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge')
Check out more information at the following link https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/using_tf.html#deploy-tensorflow-serving-models在以下链接中查看更多信息https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/using_tf.html#deploy-tensorflow-serving-models
For preprocessing there are two approaches you can take.对于预处理,您可以采用两种方法。
https://github.com/aws/amazon-sagemaker-examples/tree/master/advanced_functionality/tensorflow_bring_your_own https://github.com/aws/amazon-sagemaker-examples/tree/master/advanced_functionality/tensorflow_bring_your_own
I work for AWS & my opinions are my own我为 AWS 工作,我的意见是我自己的
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.