简体繁体 English

AWS lambda function 和 Athena 创建分区表

[英]AWS lambda function and Athena to create partitioned table

原文 2020-08-03 14:16:29 0 1 amazon-web-services/ amazon-s3/ aws-lambda/ parquet

Here's my requirements.这是我的要求。 Every day i'm receiving a CSV file into an S3 bucket.每天我都会将 CSV 文件接收到 S3 存储桶中。 I need to partition that data and store it into Parquet to eventually map a Table.我需要对这些数据进行分区并将其存储到 Parquet 中，最终得到 map 一个表。 I was thinking about using AWS lambda function that is triggered whenever a file is uploaded.我正在考虑使用每当上传文件时触发的 AWS lambda function。 I'm not sure what are the steps to do that.我不确定这样做的步骤是什么。

1 个解决方案

There are (as usual in AWS,) several ways to do this: the 2 first ones that come to me first are:有（在 AWS 中通常）有几种方法可以做到这一点：我首先想到的 2 种方法是：

using a Cloudwatch Event, with an S3 PutObject Object level) action as trigger, and a lambda function that you have already created as a target.使用 Cloudwatch 事件，使用 S3 PutObject Object 级别）操作作为触发器，以及您已经创建为目标的 lambda function。
starting from the Lambda function it is slightly easier to add suffix-filtered triggers, eg for any .csv file, by going to the function configuration in the Console, and in the Designer section adding a trigger, then choose S3 and the actions you want to use, eg bucket, event type, prefix, suffix. starting from the Lambda function it is slightly easier to add suffix-filtered triggers, eg for any .csv file, by going to the function configuration in the Console, and in the Designer section adding a trigger, then choose S3 and the actions you want使用，例如桶、事件类型、前缀、后缀。

In both cases, you will need to write the lambda function in either case to do the work you have described, and it will need IAM access to the bucket to pull the files and process them.在这两种情况下，您都需要编写 lambda function 来完成您所描述的工作，并且需要 IAM 访问存储桶以提取文件并处理它们。