简体   繁体   English

如何调试 sagemaker 批量转换中的调用超时错误?

[英]how to debug invocation timeout error in sagemaker batch transform?

I am experimenting with sagemaker, using a container from list here, https://github.com/aws/deep-learning-containers/blob/master/available_images.md to run my model and overwriting model_fn and predict_fn functions in inference.py file for loading model and prediction as shown in link here ( https://github.com/PacktPublishing/Learn-Amazon-SageMaker-second-edition/blob/main/Chapter%2007/huggingface/src/torchserve-predictor.py ).我正在试验 sagemaker,使用此处列表中的容器https://github.com/aws/deep-learning-containers/blob/master/available_images.md来运行我的 model 并覆盖 inference.py 中的 model_fn 和 predict_fn 函数用于加载 model 和预测的文件,如链接所示( https://github.com/PacktPublishing/Learn-Amazon-SageMaker-second-edition/blob/main/Chapter%2007/huggingface/src/torchserve-predictor.py ) . I keep getting invocations timeout error => "Model server did not respond to /invocations request within 3600 seconds".我不断收到调用超时错误 =>“模型服务器未在 3600 秒内响应 /invocations 请求”。 am i missing anything in my inference.py code, as to adding something to response to the ping/healthcheck?我是否在我的 inference.py 代码中遗漏了任何关于添加一些东西来响应 ping/healthcheck 的东西?

file : inference.py

import json
import torch
from transformers import AutoConfig, AutoTokenizer, DistilBertForSequenceClassification

JSON_CONTENT_TYPE = 'application/json'

def model_fn(model_dir):
    config_path = '{}/config.json'.format(model_dir)
    model_path =  '{}/pytorch_model.bin'.format(model_dir)
    config = AutoConfig.from_pretrained(config_path)
   ...

def predict_fn(input_data, model):
    //return predictions
...

The issue is not with the health checks.问题不在于健康检查。 It is with the container not responding to the /invocations request and this is can be due to model taking longer time than expected to get predictions from the input data.容器未响应 /invocations 请求,这可能是由于 model 从输入数据中获取预测所需的时间比预期的要长。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将输入/输出与 sagemaker 批量转换相匹配? - How to match input/output with sagemaker batch transform? 优化 sagemaker 上的批量转换推理 - Optimize batch transform inference on sagemaker 如何将 stepfunction executionId 解析为 SageMaker 批量转换作业名称? - How to parse stepfunction executionId to SageMaker batch transform job name? 在 Batch Transform SageMaker 中推理前预处理数据 - Preprocessing data before inference in Batch Transform SageMaker Sagemaker 优化内置算法的批量转换时间 - Sagemaker Optimize Batch Transform Time for Built In Algorithm 加载时运行 SageMaker Batch 转换失败 model - Run SageMaker Batch transform failed on loading model 如何通过自定义推理代码在 sagemaker 管道中运行批量转换作业? - how to run a batch transform job in sagemaker pipeline via custom inference code? 如何在等待响应时增加 AWS Sagemaker 调用超时 - How to increase AWS Sagemaker invocation time out while waiting for a response 如何在失败时捕获 sagemaker 错误并通过 SES、SNS 通知 - How to capture the sagemaker error in case it fails and notify via SES,SNS 使用 Sagemaker 调用具有预训练自定义端点的调用超时 PyTorch model [推理] - Invocation timed out using Sagemaker to invoke endpoints with pretrained custom PyTorch model [Inference]
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM