简体   繁体   中英

How to create an AWS Lambda/API gateway python function that takes a pdf file as input using multipart/form-data?

I have been struggling with this for a while now. I need to create a resource in API gateway linking to a lambda function that takes a pdf file as input sent as a multipart/form-data POST request. To make it simple, I am just returning the file for now.

When I try to call the API with the following curl, I get Internal server error from AWS. Did anyone ever succeeded to send a pdf file to Lambda without having to use the S3 trick (upload to S3)?

Thank you all in advance for any hint.

Commands/Files:

curl

curl -vvv -X POST -H "Content-Type: multipart/form-data" -F "content=@file.pdf" https://...MYAPIHERE.../pdf

I am currently using serverless and python3.

Below are my files:

Servelerlss.yaml

function:
  pdf:
    handler: handler.pdf
    events:
      - http:
          path: /pdf
          method: post 
          integration: lambda
          request:
            template:
              application/json: "$input.json('$')"
          response:
            headers:
              Content-Type: "'aplication/json'"

handler.py

def pdf(event, context):
    pdf = event.get('content')
    out = {'statusCode': 200,
           'isBase64Encoded': False,
           'headers': {"content-type": "application/json"},
           'body': json.dumps({
               'input':  pdf,
               'inputType': 'url',
               #'tags': list(tags.keys()),
               'error': None})}
    return(out)

I finally managed to solve this after a lot of google and with help of the AWS support team.

It turns out that API gateway checks the headers: "Content-Type" or "Accept" in the incoming request and matches it with the settings of Binary Media Type to decide which payload is considered as binary. That means we need to specify two content types (multipart/form-data, application/pdf) as Binary media type .

It is possible to do this using serveless by using serverless-apigw-binary and adding these to serverless.yaml :

plugins:
  - serverless-apigw-binary 

custom:
  apigwBinary:
    types:           #list of mime-types
      - 'multipart/form-data'
      - 'application/pdf'

But since lambda expects the payload in application/json format from the API gateway, the binary data cannot be passed directly. Therefore the settings for ContentHandling should be set to “CONVERT_TO_TEXT”. In the yaml file this translates into:

contentHandling: CONVERT_TO_TEXT

The final catch was solved by Kris Gohlson at serverless-thumbnail . Thank you for that Kris. I just wonder how did you come up with that...


Serverless.yaml

plugins:
  - serverless-apigw-binary 

custom:
  apigwBinary:
    types:           #list of mime-types
      - 'multipart/form-data'
      - 'application/pdf'

function:
  pdf:
    handler: handler.pdf
    events:
      - http:
          path: /pdf
          method: post 
          integration: lambda
          request:
            contentHandling: CONVERT_TO_TEXT
            passThrough: WHEN_NO_TEMPLATES
            template:
              application/pdf: "{'body': $input.json('$')}"
              multipart/form-data: "{'body': $input.json('$')}"
          response:
            contentHandling: CONVERT_TO_BINARY
            headers:
              Content-Type: "'aplication/json'"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM