简体   繁体   English

如何使用 lambda 函数和 boto3 从 s3 存储桶中读取 csv 文件?

[英]How to read a csv file from s3 bucket using lambda function and boto3?

I have s3 bucket and I have set a lambda function which will display contents of csv file when a csv file is uploaded to s3 bucket.S3 bucket is already set as a trigger for my lambda function.我有 s3 存储桶,并且设置了一个 lambda 函数,该函数将在将 csv 文件上传到 s3 存储桶时显示 csv 文件的内容。S3 存储桶已设置为我的 lambda 函数的触发器。 Can you please suggest and advise?你能建议和建议吗?

An AWS Lambda function is code that you write. AWS Lambda 函数是您编写的代码。 You can make it do anything you wish.你可以让它做任何你想做的事情。

For your first scenario of displaying a CSV file in CloudWatch Logs, the Lambda function should:对于在 CloudWatch Logs 中显示 CSV 文件的第一个场景,Lambda 函数应该:

  • Retrieve the name of the bucket and object from the event passed to the Lambda function从传递给 Lambda 函数的event检索存储桶和对象的名称
  • Download the file from Amazon S3 to the /tmp/ directory将文件从 Amazon S3下载/tmp/目录
  • Read the CSV using normal Python code and print() the information that you wish to appear in CloudWatch Logs使用普通 Python 代码读取 CSV 并print()您希望在 CloudWatch Logs 中显示的信息
  • Delete the temporary file, so as not to consume too much disk space (there is a limit of 500MB of temporary disk space, and Lambda containers can be reused multiple times)删除临时文件,以免占用过多磁盘空间(临时磁盘空间有500MB的限制,Lambda容器可以多次复用)

For your second question of "adding an extra column", the Lambda function should:对于“添加额外列”的第二个问题,Lambda 函数应该:

  • Retrieve the name of the bucket and object from the event passed to the Lambda function从传递给 Lambda 函数的event检索存储桶和对象的名称
  • Download the file from Amazon S3 to the /tmp/ directory将文件从 Amazon S3下载/tmp/目录
  • Manipulate the contents of the file however you wish, using Python code使用 Python 代码随意操作文件的内容
  • Upload the file to Amazon S3将文件上传到 Amazon S3
  • Delete the temporary file, so as not to consume too much disk space (there is a limit of 500MB of temporary disk space, and Lambda containers can be reused multiple times)删除临时文件,以免占用过多磁盘空间(临时磁盘空间有500MB的限制,Lambda容器可以多次复用)

The code would look something like:代码如下所示:

import urllib
import boto3

# Connect to S3 and DynamoDB
s3_client = boto3.client('s3')

def lambda_handler(event, context):

    # Get the bucket and object key from the Event
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
    localFilename = '/tmp/file.txt'

    # Download the file from S3 to the local filesystem
    s3_client.download_file(bucket, key, localFilename)

    # Do stuff here with the local file (your code here!)
    pass

    # Upload modified file
    s3_client.upload_file(localFilename, bucket, key)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM