简体   繁体   中英

How to run Lambda function for multiples files in AWS S3

I have the following Lambda function to run my script in AWS Glue when a new occurrence in an S3 bucket is checked.

import json
import boto3
from urllib.parse import unquote_plus

def lambda_handler(event, context):

  bucketName = event["Records"][0]["s3"]["bucket"]["name"]
  fileNameFull = event["Records"][0]["s3"]["object"]["key"]
  fileName = unquote_plus(fileNameFull)

  print(bucket, fileName)

  glue = boto3.client('glue')

  response = glue.start_job_run(
    JobName = 'My_Job_Glue',
    Arguments = {
      '--s3_target_path_key': fileName,
      '--s3_target_path_bucket': bucketName
    }
  )

  return {
    'statusCode': 200,
    'body': json.dumps('Hello from Lambda!')
  }
 

At first it works great and I'm partially getting what I need it to do. What actually happens, is that I will always have more than one file occurrence for the same Bucket and this current logic is running my Glue script at each occurrence (if I have 3 files, I have 3 Glue job runs). How could I improve my function in order to run my script only when all new data is identified? Today I have Kafka Connect configured that batches 5000 records and if at the end of a few minutes this batch is not reached it forces as many records as it has there.

S3 allows you to notify Lambda functions using Simple Queue Service (SQS). Using SQS allows you to batch messages to Lambda.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM