使用 Python 解析包含来自 AWS Lambda 的图像的 Base64 编码数据

Question

I have a Lambda function setup with a POST method that should be able to receive an image as multi-form data, load the image, do some calculations and return a simple array of numbers.我有一个 Lambda function 设置，带有一个POST方法，应该能够接收图像作为多格式数据，加载图像，进行一些计算并返回一个简单的数字数组。 The Lambda function sits behind a API Gateway with Lambda-Proxy integration on and multipart/form-data set as a Binary Media Type. Lambda function 位于 API 网关后面，集成了 Lambda-Proxy，并将multipart/form-data设置为二进制媒体类型。

However, I can't for the life of me seem to figure out how to parse the multi-form data that is returned from AWS Lambda.但是，我似乎无法弄清楚如何解析从 AWS Lambda 返回的多格式数据。

The event['body'] contains base64 encoded data that I can't post here because it takes up too much space. event['body']包含 base64 编码数据，我无法在此处发布，因为它占用了太多空间。

I use the following snip of code to parse the multi-form data:我使用以下代码片段来解析多格式数据：

from requests_toolbelt.multipart import decoder
multipart_string = base64.b64decode(body)
content_type = data['event']['headers']['Content-Type']
multipart_data = decoder.MultipartDecoder(multipart_string, content_type)

where content_type is 'multipart/form-data; boundary=--------------------------881952313555430391739156'其中content_type是'multipart/form-data; boundary=--------------------------881952313555430391739156' 'multipart/form-data; boundary=--------------------------881952313555430391739156' . 'multipart/form-data; boundary=--------------------------881952313555430391739156' 。

Running through the components of multipart_data like this..像这样运行multipart_data的组件..

for part in multipart_data.parts:
    print(part.content)
    print(part.headers)

gives this.给了这个。 The content (too long to post) looks like this:内容（太长无法发布）如下所示：

b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\ ... x00\x7f\xff\xd9'

and the headers:和标题：

{b'Content-Disposition': b'form-data; name="image"; filename="8281460-3x2-700x467.jpg"', b'Content-Type': b'image/jpeg'}

However, it still is not clear to me a ) What part of the content is the actual image?但是，我仍然不清楚a ）内容的哪一部分是实际图像？ b ) How can I extract the image, and eg get it into PIL with Image.open ? b ) 如何提取图像，例如使用Image.open将其放入PIL中？

Supplementary information:补充资料：

Here is the small Flask app I use to POST the image and return the event data:这是我用来发布图像并返回事件数据的小型 Flask 应用程序：

import json

from flask import Flask, request 

app = Flask(__name__)

@app.route('/', methods=['GET', 'POST'])
def hello(event, context):

    response = {
        "statusCode": 200,
        "event": event
    }

    return {
        "body": json.dumps(response),
    }

and here is the POSTMAN request as Python code:这是 POSTMAN 请求作为 Python 代码：

import requests

url = "url-to-lambda-function"

payload = "------WebKitFormBoundary7MA4YWxkTrZu0gW\r\nContent-Disposition: form-data; name=\"image\"; filename=\"8281460-3x2-700x467.jpg\"\r\nContent-Type: image/jpeg\r\n\r\n\r\n------WebKitFormBoundary7MA4YWxkTrZu0gW--"
headers = {
    'content-type': "multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW",
    'User-Agent': "PostmanRuntime/7.18.0",
    'Accept': "*/*",
    'Cache-Control': "no-cache",
    'Content-Type': "multipart/form-data; boundary=--------------------------881952313555430391739156",
    'Accept-Encoding': "gzip, deflate",
    'Content-Length': "30417",
    'Connection': "keep-alive",
    'cache-control': "no-cache"
    }

response = requests.request("POST", url, data=payload, headers=headers)

print(response.text)

Answer 1

For anyone coming here, this is how I ended up solving it:对于任何来到这里的人，这就是我最终解决它的方式：

    body = event["body"]

    content_type = event["headers"]["Content-Type"]

    body_dec = base64.b64decode(body)

    multipart_data = decoder.MultipartDecoder(body_dec, content_type)

    binary_content = []

    for part in multipart_data.parts:
        binary_content.append(part.content)

    imageStream = io.BytesIO(binary_content[0])
    imageFile = Image.open(imageStream)
    imageArray = np.array(imageFile)

which will yield a array that you can work with, as you For me the difficulty was in understanding how multipart/form-data was stitched together again.这将产生一个您可以使用的数组，因为对我来说，困难在于理解 multipart/form-data 如何再次拼接在一起。

Answer 2

AWS documentation says that the maximum payload size for (rest) API gateway is 10MB. AWS 文档说（其余）API 网关的最大有效负载大小为 10MB。 You did not provide your image size, but if it is more than 10MB then consider redesigning your architecture.您没有提供图像大小，但如果超过 10MB，则考虑重新设计您的架构。 I would suggest to upload your image to S3, so your lambda function will return a signed url .我建议将您的图像上传到 S3，因此您的 lambda function 将返回签名的 url 。 After the image is uploaded to S3, you can get this object inside your lambda function and do your calculations.图像上传到 S3 后，您可以在 lambda function 中获取此 object 并进行计算。 https://docs.aws.amazon.com/AmazonS3/latest/dev/UploadObjectPreSignedURLDotNetSDK.html https://docs.aws.amazon.com/AmazonS3/latest/dev/UploadObjectPreSignedURLDotNetSDK.html

Answer 3

To add to tmo's answer: my multipart/form-data posts (to an AWS lambda with API gateway proxy integration) required that I access the content-type header instead with:添加到 tmo 的答案：我的 multipart/form-data 帖子（到 AWS lambda 与 API 网关代理集成）要求我访问内容类型 Z099FB995346F31C749F6E40DB0F395 代替：

content_type = event['multiValueHeaders']['Content-Type'][0]

and then accessing the parts of the form-data from tmo's binary_content list with:然后从 tmo 的 binary_content 列表中访问表单数据的部分：

...
file_content = binary_content[0]
filename = str(binary_content[1].decode())
team_id = str(binary_content[2].decode())

使用 Python 解析包含来自 AWS Lambda 的图像的 Base64 编码数据

问题描述

3 个解决方案

解决方案1
2 已采纳 2019-12-16 09:08:42

解决方案2
1 2019-12-12 12:15:59

解决方案3
0 2020-01-13 12:24:29

使用 Python 解析包含来自 AWS Lambda 的图像的 Base64 编码数据

问题描述

3 个解决方案

解决方案1 2 已采纳 2019-12-16 09:08:42

解决方案2 1 2019-12-12 12:15:59

解决方案3 0 2020-01-13 12:24:29

解决方案1
2 已采纳 2019-12-16 09:08:42

解决方案2
1 2019-12-12 12:15:59

解决方案3
0 2020-01-13 12:24:29