简体   繁体   English

如何使用 Lambda 和 Python 在 AWS s3 中读取和覆盖文件?

[英]How to read and overwrite a file in AWS s3 using Lambda and Python?

I'm trying the following.我正在尝试以下。 But when i overwite a file which was invoked by lambda, due to this it is going in a loop.但是当我覆盖一个由 lambda 调用的文件时,由于这个原因,它会进入一个循环。 Can you anyone please help me.谁能帮帮我。 Below also pasted the piece of code which am using for lambda.下面还粘贴了用于 lambda 的代码。

Task任务

  1. Read a file in a folder called 'Folder A' when it is uploaded to this folder将文件上传到此文件夹时读取名为“文件夹 A”的文件夹中的文件
  2. Then replace a particualr column which has character more then 10然后替换具有超过 10 个字符的特定列
  3. then upload this file back to the same folder but unfortunately it is going in a loop due to lambda invoke然后将此文件上传回同一文件夹,但不幸的是,由于 lambda 调用,它正在循环中
  4. Tried moved to a different folder called TrimmedFile then it is working fine without any loops.尝试移动到另一个名为 TrimmedFile 的文件夹,然后它工作正常,没有任何循环。

Can someone tell me how to read, edit, save the file in the same folder which was invoked?有人可以告诉我如何阅读、编辑、将文件保存在被调用的同一文件夹中吗?

    import json
    import urllib.parse
    import boto3
    import json
    import os
    import csv
    print('Loading function')
    s3 = boto3.client('s3')
    
    def lambda_handler(event, context):
        # Get the object from the event and show its content type
        bucket = event['Records'][0]['s3']['bucket']['name']
        key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
        try:
            #print("CONTENT TYPE: " + key['ContentType'])
            #for record in event['Records']:
            print("file name " + key)
            #bucket = record['s3']['bucket']['name']
            #file_key = urllib.parse.unquote_plus(record['s3']['object']['key'], encoding='utf-8')
        
        file_key = key
        csvfile = s3.get_object(Bucket=bucket, Key=file_key)
        csvcontent = csvfile["Body"].read().decode("utf-8")
        file = csvcontent.split("\n")
        csv_reader = csv.reader(file)
        line_count = 0
        colindex = ''
        content = []
        contentstring = ''
        s33 = boto3.resource('s3')
        copy_source = {
              'Bucket': bucket,
              'Key': file_key
            }
        new_bucket = s33.Bucket(bucket)
        print(file_key)
        print(bucket)
        src_folder = "FolderA/"
        new_filekey = file_key.replace(src_folder,"")
        print(new_filekey)
        new_bucket.copy(copy_source, 'BKP/' + new_filekey )
        for row in csv_reader:
            if row:
                row = list(map(str.strip, row))
                if line_count == 0:
                    if 'ColToTruncate' in row:
                        colindex = row.index('ColToTruncate')
                        line_count += 1
                    else:
                        print('No ColToTruncate column found in '+ file_key)
                        return 'No ColToTruncate column found in '+ file_key
                else:
                    if len(row[colindex ]) >= 10:
                        row[colindex ] = row[colindex ][0:2]
                    line_count += 1  
                content.append(row)
                contentstring += ', '.join(row) 
                contentstring = contentstring + '\n'
        #print(contentstring)
        #filename = file_key + '.csv'
        uploadByteStream = bytes(contentstring.encode('utf-8'))
        #new_key = 'TrimmedFiles/' + new_filekey
        s3.put_object(Bucket=bucket, Key=file_key , Body=uploadByteStream)
        return True
    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e

I believe you have created an event Trigger on S3 and associated it with Lambda and when you are replacing the file you get the lambda triggered and it becomes a loop.我相信您已经在 S3 上创建了一个事件触发器并将其与 Lambda 相关联,当您替换文件时,您会触发 lambda 并且它变成一个循环。

There could be 2 ways to handle it:可能有两种处理方法:

1.Configure a PUT OR POST event type ( which ever suits your case) to trigger the lambda. 1.配置 PUT OR POST 事件类型(适合您的情况)以触发 lambda。 Now save the updated file at another location and then copy it to the original one.现在将更新的文件保存在另一个位置,然后将其复制到原始位置。 Doing this s3 will generate a "S3:ObjectCreated:Copy" event which will not invoke the Lambda again.执行此操作 s3 将生成“S3:ObjectCreated:Copy”事件,该事件不会再次调用 Lambda。

 # Copying file from secondary location to original location
 copy_sr = {
        "Bucket":bucket,
        "Key"   :file_key_copy
        
    }
    
    s3_resource.meta.client.copy(copy_sr, 
    final_bucket,file_key_copy
    )
    
    #Deleting the file from the secondary location
    s3_client.delete_object(Bucket=bucket,
    Key=file_key_copy
    ) 

2.Use SQS queue and configure it not to precess any message received twice in a specified period of time ( depending on the frequency of file getting updated) 2.使用SQS队列并配置它在指定的时间段内不处理两次接收到的任何消息(取决于文件更新的频率)

This is to demonstrate how to read a file and and replace it after editing .这是为了演示如何读取文件并在编辑替换它。 It can act as a skeleton code.它可以充当骨架代码。

import boto3
import base64
import json
import io


client = boto3.client('s3')
res = boto3.resource('s3')

def lambda_handler(event, context):

    file_key = event['file_key']
    file_obj = s3_res.Object("bucket_name", file_key)

    content_obj = file_obj.get()['Body'].read().decode('utf-8') # fetching the data in

    res.Object("bucket_name", file_key).delete() # Here you are deleting the old file

    ######Performing your operation and saving in new_data variable#########

    new_file = io.BytesIO(new_data.encode())

    client.upload_fileobj(new_file, "bucket_name", file_key) # uploading the file at the exact same location.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM