简体   繁体   English

如何在 AWS S3 中为多个文件运行 Lambda 函数

[英]How to run Lambda function for multiples files in AWS S3

I have the following Lambda function to run my script in AWS Glue when a new occurrence in an S3 bucket is checked.当检查 S3 存储桶中的新事件时,我有以下 Lambda 函数在 AWS Glue 中运行我的脚本。

import json
import boto3
from urllib.parse import unquote_plus

def lambda_handler(event, context):

  bucketName = event["Records"][0]["s3"]["bucket"]["name"]
  fileNameFull = event["Records"][0]["s3"]["object"]["key"]
  fileName = unquote_plus(fileNameFull)

  print(bucket, fileName)

  glue = boto3.client('glue')

  response = glue.start_job_run(
    JobName = 'My_Job_Glue',
    Arguments = {
      '--s3_target_path_key': fileName,
      '--s3_target_path_bucket': bucketName
    }
  )

  return {
    'statusCode': 200,
    'body': json.dumps('Hello from Lambda!')
  }
 

At first it works great and I'm partially getting what I need it to do.起初它工作得很好,我部分地得到了我需要它做的事情。 What actually happens, is that I will always have more than one file occurrence for the same Bucket and this current logic is running my Glue script at each occurrence (if I have 3 files, I have 3 Glue job runs).实际发生的情况是,对于同一个 Bucket,我总是会出现多个文件,并且当前的逻辑是在每次出现时运行我的 Glue 脚本(如果我有 3 个文件,我有 3 个 Glue 作业运行)。 How could I improve my function in order to run my script only when all new data is identified?只有在识别出所有新数据时,我才能改进我的功能以便运行我的脚本? Today I have Kafka Connect configured that batches 5000 records and if at the end of a few minutes this batch is not reached it forces as many records as it has there.今天我已经配置了 Kafka Connect 批处理 5000 条记录,如果在几分钟后没有达到这个批处理,它会强制执行尽可能多的记录。

S3 allows you to notify Lambda functions using Simple Queue Service (SQS). S3 允许您使用简单队列服务 (SQS) 通知 Lambda 函数。 Using SQS allows you to batch messages to Lambda.使用 SQS 允许您将消息批处理到 Lambda。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 创建 AWS lambda function 以在 s3 存储桶中拆分 pdf 个文件 - Creating an AWS lambda function to split pdf files in a s3 bucket 如何使用 S3 和 Slack 集成编写 AWS lambda 函数 - How to write a AWS lambda function with S3 and Slack integration AWS Lambda:如何读取 S3 存储桶中的 CSV 文件然后将其上传到另一个 S3 存储桶? - AWS Lambda: How to read CSV files in S3 bucket then upload it to another S3 bucket? 到达 AWS Lambda function 上的 S3 位置 - Reaching an S3 location on AWS Lambda function 在 AWS S3 中新文件到达时触发 AWS Lambda - Triggering AWS Lambda on arrival of new files in AWS S3 AWS lambda:在本地调用可访问 s3 的 python lambda 函数 - AWS lambda: Invoking locally a python lambda function with access to s3 如何使用 AWS Lambda 函数将文件从 AWS S3 存储桶复制到 EC2 linux 机器 - How to copy files from AWS S3 bucket to EC2 linux machine using AWS Lambda Functions AWS lambda函数可从s3检索所有上传的文件,然后将解压缩的文件夹再次上传回s3 - AWS lambda function to retrieve any uploaded files from s3 and upload the unzipped folder back to s3 again AWS Lambda 函数可以直接处理 s3 上的文件还是需要移动到 /tmp/? - Can a AWS Lambda function work directly with files on s3 or is moving to /tmp/ needed? Merging multiple JSON files into single JSON file in S3 from AWS Lambda python function - Merging multiple JSON files into single JSON file in S3 from AWS Lambda python function
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM