简体   繁体   English

如何从Lambda函数解析AWS S3文件

[英]How to parse an AWS S3 file from a Lambda function

I need some help with correctly structuring the code for process some text files using S3 buckets and a Lambda function. 我需要一些帮助来正确构建代码,以便使用S3存储桶和Lambda函数处理一些文本文件。

I want to use a Lambda function triggered by creation of a new object in the S3 bucket to read the file and to extract some data and write this to a file that gets placed in another S3 bucket. 我想使用通过在S3存储桶中创建新对象而触发的Lambda函数来读取文件并提取一些数据并将其写入放置在另一个S3存储桶中的文件。

So far I have the function working fine copying the file from one S3 bucket to another but I can't quite figure out how to add a function to process the file and write the result out to the final S3 destination. 到目前为止,我的功能正常,将文件从一个S3存储桶复制到另一个存储桶,但我无法弄清楚如何添加一个函数来处理文件并将结果写入最终的S3目的地。

The files are simple text files and I need to extract data from each line in the file. 这些文件是简单的文本文件,我需要从文件中的每一行中提取数据。

Below if the Node.js code I am currently using with an additional function added to process the file - see comments with ?? 如果我正在使用的Node.js代码添加了一个额外的函数来处理文件 - 请参阅注释? where I am looking for help. 我在哪里寻求帮助。

// dependencies
var async = require('async');
var AWS = require('aws-sdk');
var util = require('util');


// get reference to S3 client 
var s3 = new AWS.S3();

exports.handler = function(event, context) {
    // Read options from the event.
    console.log("Reading options from event:\n", util.inspect(event, {depth: 5}));
    var srcBucket = event.Records[0].s3.bucket.name;
    // Object key may have spaces or unicode non-ASCII characters.
    var srcKey    =
    decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, " "));  
    var dstBucket = "inputBucket";
    var dstKey    = srcKey + ".txt";

    // Sanity check: validate that source and destination are different buckets.
    if (srcBucket == dstBucket) {
        console.error("Destination bucket must not match source bucket.");
        return;
    }

    // Infer the file type.
    var typeMatch = srcKey.match(/\.([^.]*)$/);
    if (!typeMatch) {
        console.error('unable to infer file type for key ' + srcKey);
        return;
    }
    var imageType = typeMatch[1];
    if (imageType != "txt") {
        console.log('skipping non-image ' + srcKey);
        return;
    }

    // Download the image from S3, transform, and upload to a different S3 bucket.
    async.waterfall([
        function download(next) {
            // Download the file from S3 into a buffer.
            s3.getObject({
                    Bucket: srcBucket,
                    Key: srcKey
                },
                next);
            },
        function transform(response, next) {
            // Read the file we have just downloaded 
            // ? response.Body ?
            var rl = require('readline').createInterface({
                input: require('fs').createReadStream('file.in')
            });

            // Process each line here writing the result to an output buffer?
            rl.on('line', function (line) {
                 console.log('Line from file:', line);
                //Do something with the line... 

                //Create some output string 'outputline'

                //Write 'outputline' to an output buffer 'outbuff'
                // ??

            });
            // Now pass the output buffer to the next function
            // so it can be uploaded to another S3 bucket 
            // ?? 
            next;
        }
        function upload(response, next) {
            // Stream the file to a different S3 bucket.
            s3.putObject({
                    Bucket: dstBucket,
                    Key: dstKey,
                    Body: response.Body,
                    ContentType: response.contentType
                },
                next);
            }
        ], function (err) {
            if (err) {
                console.error(
                    'Unable to process ' + srcBucket + '/' + srcKey +
                    ' and upload to ' + dstBucket + '/' + dstKey +
                    ' due to an error: ' + err
                );
            } else {
                console.log(
                    'Successfully processed ' + srcBucket + '/' + srcKey +
                    ' and uploaded to ' + dstBucket + '/' + dstKey
                );
            }

            context.done();
        }
    );
};

Inside the callback of s3.getObject 在s3.getObject的回调中

s3.getObject(params,function(err,data){}) 

If your file is a text then you can extract the text as a string 如果您的文件是文本,那么您可以将文本提取为字符串

data.Body.toString("utf-8")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 s3 存储桶解析 CSV 以在 javascript AWS Lambda function 中使用 - How to parse CSVs from s3 bucket to use in a javascript AWS Lambda function 使用Node.js AWS Lambda从S3加载并解析yaml文件 - Load and parse an yaml file from S3, using a nodejs AWS lambda function 在从 aws 中的 s3 读取文件后将数据传递给 aws cognito 时退出 lambda 在 nodejs 中 - function exits when passing data to aws cognito after reading file from s3 in aws lambda in nodejs 使用 Lambda 和节点 Stream 从 S3 解析 csv 文件 - Parse csv file from S3 using Lambda and Node Stream AWS lambda function 从 S3 文件夹中删除文件 - AWS lambda function delete files from S3 folder 如何通过 AWS lambda nodejs function 索引存储在 S3 中的 XML 文件? - How can I index a XML file stored in S3 through AWS lambda nodejs function? 带有Java的AWS Lambda无法从S3获取文件 - AWS Lambda with Java unable to GET a file from S3 如何使用 Node.js AWS Lambda ZC1C425268E68385D1AB5074C17A94 从 S3 读取 CSV 数据 - How to read CSV data from S3 using Node.js AWS Lambda function 如何从S3存储桶读取大型XML文件,然后使用AWS Lambda将其用作HTTP请求正文 - How to read a large XML file from S3 bucket and then use it as an HTTP request body using AWS Lambda 如何使用 Node.js 将文件从 AWS Lambda 中的 /tmp 文件夹上传到 S3 - How to upload a file to S3 from the /tmp folder in AWS Lambda using Node.js
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM