[英]Read and Copy S3 inventory data from SNS topic trigger with AWS lambda function
I am a data analyst and new to AWS lambda functions. 我是数据分析师,并且是AWS lambda函数的新手。 I have an s3 bucket where I store the Inventory data from our data-lake which is generated using Inventory feature under S3 Management tab.
我有一个s3存储桶,用于存储来自数据仓库的库存数据,该数据是使用“ S3管理”选项卡下的“库存”功能生成的。
So lets say the inventory data (reports) looks like this: 因此,可以说库存数据(报告)如下所示:
s3://my-bucket/allobjects/data/report-1.csv.gz
s3://my-bucket/allobjects/data/report-2.csv.gz
s3://my-bucket/allobjects/data/report-3.csv.gz
Regardless of the file contents, I have an Event setup for s3://my-bucket/allobjects/data/ which notifies an SNS topic during any event like GET or PUT. 无论文件内容如何,我都有s3:// my-bucket / allobjects / data /的事件设置,该事件在GET或PUT之类的任何事件期间都会通知SNS主题。 (I cant change this workflow due to strict governance)
(由于严格的管理,我无法更改此工作流程)
Now, I am trying to create a Lambda Function with this SNS topic as a trigger and simply move the inventory-report files generated by the S3 Inventory feature under 现在,我尝试使用此SNS主题作为触发器来创建Lambda函数,并简单地将S3库存功能生成的库存报告文件移至
s3://my-bucket/allobjects/data/
and repartition it as follows: 并重新分区如下:
s3://my-object/allobjects/partitiondata/year=2019/month=01/day=29/report-1.csv.gz
s3://my-object/allobjects/partitiondata/year=2019/month=01/day=29/report-2.csv.gz
s3://my-object/allobjects/partitiondata/year=2019/month=01/day=29/report-3.csv.gz
How can I achieve this using the lambda function (node.js or python is fine) reading an SNS topic? 如何使用lambda函数(可以使用node.js或python)读取SNS主题来实现此目的? Any help is appreciated.
任何帮助表示赞赏。
I tried something like this based on some smaple code i found online but it didnt help. 我尝试了一些类似的方法,这些方法基于我在网上发现的一些通用代码,但没有帮助。
console.log('Loading function');
var AWS = require('aws-sdk');
AWS.config.region = 'us-east-1';
exports.handler = function(event, context) {
console.log("\n\nLoading handler\n\n");
var sns = new AWS.SNS();
sns.publish({
Message: 'File(s) uploaded successfully',
TopicArn: 'arn:aws:sns:_my_ARN'
}, function(err, data) {
if (err) {
console.log(err.stack);
return;
}
console.log('push sent');
console.log(data);
context.done(null, 'Function Finished!');
});
};
The preferred method would be for the Amazon S3 Event to trigger the AWS Lambda function directly. 首选方法是让Amazon S3事件直接触发AWS Lambda函数。 But since you cannot alter this port, the flow would be:
但是由于您不能更改此端口,因此流程如下:
copy_object()
to another location. copy_object()
移到另一个位置。 (There is no move command. You will need to copy the object to a new bucket/key.) The content of the event
field is something like: event
字段的内容类似于:
{
"Records": [
{
"EventSource": "aws:sns",
"EventVersion": "1.0",
"EventSubscriptionArn": "...",
"Sns": {
"Type": "Notification",
"MessageId": "1c3189f0-ffd3-53fb-b60b-dd3beeecf151",
"TopicArn": "...",
"Subject": "Amazon S3 Notification",
"Message": "{\"Records\":[{\"eventVersion\":\"2.1\",\"eventSource\":\"aws:s3\",\"awsRegion\":\"ap-southeast-2\",\"eventTime\":\"2019-01-30T02:42:07.129Z\",\"eventName\":\"ObjectCreated:Put\",\"userIdentity\":{\"principalId\":\"AWS:AIDAIZCFQCOMZZZDASS6Q\"},\"requestParameters\":{\"sourceIPAddress\":\"54.1.1.1\"},\"responseElements\":{\"x-amz-request-id\":\"...",\"x-amz-id-2\":\"..."},\"s3\":{\"s3SchemaVersion\":\"1.0\",\"configurationId\":\"...\",\"bucket\":{\"name\":\"stack-lake\",\"ownerIdentity\":{\"principalId\":\"...\"},\"arn\":\"arn:aws:s3:::stack-lake\"},\"object\":{\"key\":\"index.html\",\"size\":4378,\"eTag\":\"...\",\"sequencer\":\"...\"}}}]}",
"Timestamp": "2019-01-30T02:42:07.212Z",
"SignatureVersion": "1",
"Signature": "...",
"SigningCertUrl": "...",
"UnsubscribeUrl": "...",
"MessageAttributes": {}
}
}
]
}
Thus, the name of the uploaded Object needs to be extracted from the Message
. 因此,需要从
Message
提取上载对象的名称。
You could use code like this: 您可以使用如下代码:
import json
def lambda_handler(event, context):
for record1 in event['Records']:
message = json.loads(record1['Sns']['Message'])
for record2 in message['Records']:
bucket = record2['s3']['bucket']['name'])
key = record2['s3']['object']['key'])
# Do something here with bucket and key
return {
'statusCode': 200,
'body': json.dumps(event)
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.