简体   繁体   English

使用AWS Lambda函数从SNS主题触发器读取和复制S3库存数据

[英]Read and Copy S3 inventory data from SNS topic trigger with AWS lambda function

I am a data analyst and new to AWS lambda functions. 我是数据分析师,并且是AWS lambda函数的新手。 I have an s3 bucket where I store the Inventory data from our data-lake which is generated using Inventory feature under S3 Management tab. 我有一个s3存储桶,用于存储来自数据仓库的库存数据,该数据是使用“ S3管理”选项卡下的“库存”功能生成的。

So lets say the inventory data (reports) looks like this: 因此,可以说库存数据(报告)如下所示:

s3://my-bucket/allobjects/data/report-1.csv.gz
s3://my-bucket/allobjects/data/report-2.csv.gz
s3://my-bucket/allobjects/data/report-3.csv.gz

Regardless of the file contents, I have an Event setup for s3://my-bucket/allobjects/data/ which notifies an SNS topic during any event like GET or PUT. 无论文件内容如何,​​我都有s3:// my-bucket / allobjects / data /的事件设置,该事件在GET或PUT之类的任何事件期间都会通知SNS主题。 (I cant change this workflow due to strict governance) (由于严格的管理,我无法更改此工作流程)

Now, I am trying to create a Lambda Function with this SNS topic as a trigger and simply move the inventory-report files generated by the S3 Inventory feature under 现在,我尝试使用此SNS主题作为触发器来创建Lambda函数,并简单地将S3库存功能生成的库存报告文件移至

s3://my-bucket/allobjects/data/ 

and repartition it as follows: 并重新分区如下:

s3://my-object/allobjects/partitiondata/year=2019/month=01/day=29/report-1.csv.gz
s3://my-object/allobjects/partitiondata/year=2019/month=01/day=29/report-2.csv.gz
s3://my-object/allobjects/partitiondata/year=2019/month=01/day=29/report-3.csv.gz

How can I achieve this using the lambda function (node.js or python is fine) reading an SNS topic? 如何使用lambda函数(可以使用node.js或python)读取SNS主题来实现此目的? Any help is appreciated. 任何帮助表示赞赏。

I tried something like this based on some smaple code i found online but it didnt help. 我尝试了一些类似的方法,这些方法基于我在网上发现的一些通用代码,但没有帮助。

console.log('Loading function');

var AWS = require('aws-sdk');  
AWS.config.region = 'us-east-1';

exports.handler = function(event, context) {  
console.log("\n\nLoading handler\n\n");
var sns = new AWS.SNS();

sns.publish({
    Message: 'File(s) uploaded successfully',
    TopicArn: 'arn:aws:sns:_my_ARN'
}, function(err, data) {
    if (err) {
        console.log(err.stack);
        return;
    }
    console.log('push sent');
    console.log(data);
    context.done(null, 'Function Finished!');  
});
};

The preferred method would be for the Amazon S3 Event to trigger the AWS Lambda function directly. 首选方法是让Amazon S3事件直接触发AWS Lambda函数。 But since you cannot alter this port, the flow would be: 但是由于您不能更改此端口,因此流程如下:

  • The Amazon S3 Event will send a message to an Amazon SNS topic. Amazon S3事件将向Amazon SNS主题发送消息。
  • The AWS Lambda function is subscribed to the SNS topic, so it is triggered and receives the message from S3. AWS Lambda函数已订阅SNS主题,因此将触发它并从S3接收消息。
  • The Lambda function extracts the Bucket and Key, then calls S3 to copy_object() to another location. Lambda函数提取Bucket和Key,然后调用S3将copy_object()移到另一个位置。 (There is no move command. You will need to copy the object to a new bucket/key.) (没有移动命令。您将需要对象复制到新的存储桶/密钥。)

The content of the event field is something like: event字段的内容类似于:

{
    "Records": [
        {
            "EventSource": "aws:sns",
            "EventVersion": "1.0",
            "EventSubscriptionArn": "...",
            "Sns": {
                "Type": "Notification",
                "MessageId": "1c3189f0-ffd3-53fb-b60b-dd3beeecf151",
                "TopicArn": "...",
                "Subject": "Amazon S3 Notification",
                "Message": "{\"Records\":[{\"eventVersion\":\"2.1\",\"eventSource\":\"aws:s3\",\"awsRegion\":\"ap-southeast-2\",\"eventTime\":\"2019-01-30T02:42:07.129Z\",\"eventName\":\"ObjectCreated:Put\",\"userIdentity\":{\"principalId\":\"AWS:AIDAIZCFQCOMZZZDASS6Q\"},\"requestParameters\":{\"sourceIPAddress\":\"54.1.1.1\"},\"responseElements\":{\"x-amz-request-id\":\"...",\"x-amz-id-2\":\"..."},\"s3\":{\"s3SchemaVersion\":\"1.0\",\"configurationId\":\"...\",\"bucket\":{\"name\":\"stack-lake\",\"ownerIdentity\":{\"principalId\":\"...\"},\"arn\":\"arn:aws:s3:::stack-lake\"},\"object\":{\"key\":\"index.html\",\"size\":4378,\"eTag\":\"...\",\"sequencer\":\"...\"}}}]}",
                "Timestamp": "2019-01-30T02:42:07.212Z",
                "SignatureVersion": "1",
                "Signature": "...",
                "SigningCertUrl": "...",
                "UnsubscribeUrl": "...",
                "MessageAttributes": {}
            }
        }
    ]
}

Thus, the name of the uploaded Object needs to be extracted from the Message . 因此,需要从Message提取上载对象的名称。

You could use code like this: 您可以使用如下代码:

import json

def lambda_handler(event, context):

    for record1 in event['Records']:
        message = json.loads(record1['Sns']['Message'])

        for record2 in message['Records']:

            bucket = record2['s3']['bucket']['name'])
            key = record2['s3']['object']['key'])

            # Do something here with bucket and key

    return {
        'statusCode': 200,
        'body': json.dumps(event)
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM