在 lambda 函数中调用 Sagemaker 端点 invoke_endpoint，请求正文格式的问题

Question

I have a deployed Sagemaker endpoint.我有一个已部署的 Sagemaker 端点。 When testing the endpoint using Predictor.predict , the endpoint works fine.使用Predictor.predict测试端点时，端点工作正常。 I can pass down whichever Json format, it is able to process it correctly.我可以传递任何 Json 格式，它能够正确处理它。 However, I've been struggling calling endpoint from Lambda by using client.invoke_endpoint但是，我一直在努力使用client.invoke_endpoint从 Lambda 调用端点

I am trying to modify my request body to follow this format in this AWS documentation .我正在尝试修改我的请求正文以遵循此AWS 文档中的这种格式。

let request = {
  // Instances might contain multiple rows that predictions are sought for.
  "instances": [
    {
      // Request and algorithm specific inference parameters.
      "configuration": {},
      // Data in the specific format required by the algorithm.
      "data": {
         "<field name>": dataElement
       }
    }
  ]
}

I am not sure what should the configuration be, so this is what my request body looks like.我不确定配置应该是什么，所以这就是我的请求正文的样子。

{
  "instances": [
    {
      "data": {
        "ID": "some ID",
        "ACCOUNT": null,
        "LEAD": some ID
        "FORMNAME": "some Form",
        "UTMMEDIUM": "some Medium",
        "UTMSOURCE": "some Source"
      }
    },
    {
      "data": {
        "ID": "some ID"
        "ACCOUNT": "some ID"
        "LEAD": null,
        "FORMNAME": "some Form"
        "UTMMEDIUM": null,
        "UTMSOURCE": null
      }
    }
  ]
}

This is my Lambda function这是我的 Lambda 函数

import os
import io
import boto3
import json

# grab environment variables
ENDPOINT_NAME = 'xxxxx'

client = boto3.client('sagemaker-runtime')
def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))
    data = json.loads(json.dumps(event))
    payload = str(data["instances"])
    response = client.invoke_endpoint(EndpointName = ENDPOINT_NAME,
                                      Body = payload,
                                      ContentType = 'application/json',
                                      Accept = 'application/json')
    print(response)
    return 'nothing'

I am able to invoke the Endpoint using the code above, but the endpoint kept having trouble processing the input.我可以使用上面的代码调用端点，但端点在处理输入时一直遇到问题。 The input_fn in the endpoint looks like this端点中的input_fn看起来像这样

def input_fn(input_data, content_type):

    if content_type == 'text/csv':
        # Read the raw input data as CSV. 
        df = pd.read_csv(StringIO(input_data))

        return df
    
    elif content_type == 'application/json':
        print('input_fn (elif): input_data')
        print(input_data)
        print(type(input_data))
        print('input_fn (elif): input_data eval')
        print(eval(input_data))
        print('input_fn (elif): input_data eval type')
        print(type(eval(input_data)))
        df = pd.read_json(eval(input_data))
        print('input_fn (elif): df.columns')
        print(df.columns)
        return df
    
    else:
        raise ValueError("{} not supported by script!".format(content_type))

The error I got is ValueError: Invalid file path or buffer object type: <class 'list'>我得到的错误是ValueError: Invalid file path or buffer object type: <class 'list'>

The type of input_data is a string, and the type of eval(input_data) is a list. input_data的类型是字符串， eval(input_data)的类型是列表。

I appreciate any insight!我很欣赏任何见解！ I've tried so many different things, including removing eval from my input_fn , or change pd.json_read to json.loads(json.dumps()) with pd.DataFrame.from_dict .我试过很多不同的东西，包括删除eval从我input_fn ，或改变pd.json_read到json.loads(json.dumps())与pd.DataFrame.from_dict 。 I've gotten different errors like json.decoder.JSONDecodeError: Expecting value: Line Column 42 (column 42 is where the location of null), and unhashable type: 'dict'我收到了不同的错误，例如json.decoder.JSONDecodeError: Expecting value: Line Column 42 （ json.decoder.JSONDecodeError: Expecting value: Line Column 42列是 null 的位置）和unhashable type: 'dict'

I am really confused and not sure what to to next.我真的很困惑，不知道接下来要做什么。 Thank you!谢谢！

Answer 1

This would involve debugging a bit more But I recommend replacing instances of input_data with input_data[0] and also try decoding since it could be byte encoded.这将涉及更多调试但我建议用input_data[0]替换 input_data 的实例，并尝试解码，因为它可能是字节编码的。 Maybe something like this, the reason for this kind of recommendation is described at bottom and might help, also instead of print use logging so that you can see input_data value in cloudwatch logs也许是这样的，这种推荐的原因在底部描述，可能会有所帮助，也可以代替打印使用日志记录，以便您可以在 cloudwatch 日志中看到 input_data 值

input_data = (input_data[0].get('data') or input_data[0].get('body')).decode('utf-8')

Reason for recommendation:推荐理由：

In Sagemaker when using pytorch type models在 Sagemaker 中使用 pytorch 类型模型时

the input_data is a list and each entry corresponds to data received from some request input_data 是一个列表，每个条目对应于从某个请求收到的数据
The size of list will be number of request in it which also correspond to batch size in case it is using torchserve.列表的大小将是其中的请求数，如果它使用火炬服务，它也对应于批处理大小。 By default batch_size is 1 in torchserve, so input_data is single element list.默认情况下，torchserve 中的 batch_size 为 1，因此 input_data 是单元素列表。
also there is chance that you get byte encoded data也有机会获得字节编码的数据
You might refer here also https://pytorch.org/serve/custom_service.html您也可以在这里参考https://pytorch.org/serve/custom_service.html

This should help explain ValueError: Invalid file path or buffer object type: <class 'list'> and JSON decode error这应该有助于解释ValueError: Invalid file path or buffer object type: <class 'list'>和 JSON 解码错误

Answer 2

The issue you are having is in this line您遇到的问题在这一行

payload = str(data["instances"])

You've already serialized the data, so the following code block will do justice on the boto3 invocation side as you properly decode the response.您已经序列化了数据，因此当您正确解码响应时，以下代码块将在 boto3 调用端发挥作用。

#event is the test request you are sending
data = json.loads(json.dumps(event))
payload = json.dumps(data)
response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
                                   ContentType='application/json',
                                   Body=payload)
result = json.loads(response['Body'].read().decode()) #decode response

For processing the data you want appropriate exception handling when getting a non-JSON input.为了处理数据，您需要在获取非 JSON 输入时进行适当的异常处理。 Use the following code to parse the request on the Flask end.使用以下代码解析Flask端的请求。

input_json = flask.request.get_json()
input = input_json['input'] #whatever your input json key is
result = model.predict(input)
# Transform predictions to JSON
result = {
        'output': predictions
}
resultjson = json.dumps(result)
return flask.Response(response=resultjson, status=200, mimetype='application/json')

I am contributing this on behalf of my employer, AWS.我代表我的雇主 AWS 做出贡献。 My contribution is licensed under the MIT license.我的贡献是在 MIT 许可下获得许可的。 See here for a more detailed explanation.有关更详细的解释，请参见此处。

https://aws-preview.aka.amazon.com/tools/stackoverflow-samples-license/ https://aws-preview.aka.amazon.com/tools/stackoverflow-samples-license/

在 lambda 函数中调用 Sagemaker 端点 invoke_endpoint，请求正文格式的问题

问题描述

2 个解决方案

解决方案1
0 2021-06-23 03:03:23

解决方案2
0 2021-07-22 18:55:17

在 lambda 函数中调用 Sagemaker 端点 invoke_endpoint，请求正文格式的问题

问题描述

2 个解决方案

解决方案1 0 2021-06-23 03:03:23

解决方案2 0 2021-07-22 18:55:17

解决方案1
0 2021-06-23 03:03:23

解决方案2
0 2021-07-22 18:55:17