简体   繁体   中英

How to read and overwrite a file in AWS s3 using Lambda and Python?

I'm trying the following. But when i overwite a file which was invoked by lambda, due to this it is going in a loop. Can you anyone please help me. Below also pasted the piece of code which am using for lambda.

Task

  1. Read a file in a folder called 'Folder A' when it is uploaded to this folder
  2. Then replace a particualr column which has character more then 10
  3. then upload this file back to the same folder but unfortunately it is going in a loop due to lambda invoke
  4. Tried moved to a different folder called TrimmedFile then it is working fine without any loops.

Can someone tell me how to read, edit, save the file in the same folder which was invoked?

    import json
    import urllib.parse
    import boto3
    import json
    import os
    import csv
    print('Loading function')
    s3 = boto3.client('s3')
    
    def lambda_handler(event, context):
        # Get the object from the event and show its content type
        bucket = event['Records'][0]['s3']['bucket']['name']
        key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
        try:
            #print("CONTENT TYPE: " + key['ContentType'])
            #for record in event['Records']:
            print("file name " + key)
            #bucket = record['s3']['bucket']['name']
            #file_key = urllib.parse.unquote_plus(record['s3']['object']['key'], encoding='utf-8')
        
        file_key = key
        csvfile = s3.get_object(Bucket=bucket, Key=file_key)
        csvcontent = csvfile["Body"].read().decode("utf-8")
        file = csvcontent.split("\n")
        csv_reader = csv.reader(file)
        line_count = 0
        colindex = ''
        content = []
        contentstring = ''
        s33 = boto3.resource('s3')
        copy_source = {
              'Bucket': bucket,
              'Key': file_key
            }
        new_bucket = s33.Bucket(bucket)
        print(file_key)
        print(bucket)
        src_folder = "FolderA/"
        new_filekey = file_key.replace(src_folder,"")
        print(new_filekey)
        new_bucket.copy(copy_source, 'BKP/' + new_filekey )
        for row in csv_reader:
            if row:
                row = list(map(str.strip, row))
                if line_count == 0:
                    if 'ColToTruncate' in row:
                        colindex = row.index('ColToTruncate')
                        line_count += 1
                    else:
                        print('No ColToTruncate column found in '+ file_key)
                        return 'No ColToTruncate column found in '+ file_key
                else:
                    if len(row[colindex ]) >= 10:
                        row[colindex ] = row[colindex ][0:2]
                    line_count += 1  
                content.append(row)
                contentstring += ', '.join(row) 
                contentstring = contentstring + '\n'
        #print(contentstring)
        #filename = file_key + '.csv'
        uploadByteStream = bytes(contentstring.encode('utf-8'))
        #new_key = 'TrimmedFiles/' + new_filekey
        s3.put_object(Bucket=bucket, Key=file_key , Body=uploadByteStream)
        return True
    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e

I believe you have created an event Trigger on S3 and associated it with Lambda and when you are replacing the file you get the lambda triggered and it becomes a loop.

There could be 2 ways to handle it:

1.Configure a PUT OR POST event type ( which ever suits your case) to trigger the lambda. Now save the updated file at another location and then copy it to the original one. Doing this s3 will generate a "S3:ObjectCreated:Copy" event which will not invoke the Lambda again.

 # Copying file from secondary location to original location
 copy_sr = {
        "Bucket":bucket,
        "Key"   :file_key_copy
        
    }
    
    s3_resource.meta.client.copy(copy_sr, 
    final_bucket,file_key_copy
    )
    
    #Deleting the file from the secondary location
    s3_client.delete_object(Bucket=bucket,
    Key=file_key_copy
    ) 

2.Use SQS queue and configure it not to precess any message received twice in a specified period of time ( depending on the frequency of file getting updated)

This is to demonstrate how to read a file and and replace it after editing . It can act as a skeleton code.

import boto3
import base64
import json
import io


client = boto3.client('s3')
res = boto3.resource('s3')

def lambda_handler(event, context):

    file_key = event['file_key']
    file_obj = s3_res.Object("bucket_name", file_key)

    content_obj = file_obj.get()['Body'].read().decode('utf-8') # fetching the data in

    res.Object("bucket_name", file_key).delete() # Here you are deleting the old file

    ######Performing your operation and saving in new_data variable#########

    new_file = io.BytesIO(new_data.encode())

    client.upload_fileobj(new_file, "bucket_name", file_key) # uploading the file at the exact same location.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM