簡體   English   中英

如何使用 Boto3 按上次修改日期過濾 s3 對象

[英]How to filter s3 objects by last modified date with Boto3

有沒有辦法按 boto3 中的最后修改日期過濾 s3 對象? 我已經構建了一個包含存儲桶中所有內容的大型文本文件列表。 一段時間過去了,我只想列出上次循環遍歷整個存儲桶后添加的對象。

我知道我可以使用Marker屬性從某個對象名稱開始,所以我可以給它我在文本文件中處理的最后一個對象,但這並不能保證在該對象名稱之前沒有添加新對象。 例如,如果文本文件中的最后一個文件是 Oak.txt 並且添加了一個名為 apple.txt 的新文件,則它不會選擇該文件。

s3_resource = boto3.resource('s3')
client = boto3.client('s3')

def list_rasters(bucket):

    bucket = s3_resource.Bucket(bucket)

    for bucket_obj in bucket.objects.filter(Prefix="testing_folder/"):
        print bucket_obj.key
        print bucket_obj.last_modified

以下代碼片段獲取特定文件夾下的所有對象,並檢查最后修改的文件是否在您指定的時間之后創建:

用您的值替換YEAR,MONTH, DAY

import boto3
import datetime
#bucket Name
bucket_name = 'BUCKET NAME'
#folder Name
folder_name = 'FOLDER NAME'
#bucket Resource
s3 = boto3.resource('s3')
bucket = s3.Bucket(bucket_name)    
def lambda_handler(event, context):
     for file in bucket.objects.filter(Prefix= folder_name):
         #compare dates 
         if file.last_modified.replace(tzinfo = None) > datetime.datetime(YEAR,MONTH, DAY,tzinfo = None):
             #print results
             print('File Name: %s ---- Date: %s' % (file.key,file.last_modified))

下面的代碼片段將使用 s3 Object 類 get() 操作僅返回滿足 IfModifiedSince 日期時間參數的那些。 該腳本打印文件,這是原始問題,但也將文件保存在本地。

import boto3
import io
from datetime import date, datetime, timedelta


# Defining AWS S3 resources
s3 = boto3.resource('s3')
bucket = s3.Bucket('<bucket_name>')
prefix = '<object_key_prefix, if any>'

# note this based on UTC time
yesterday = datetime.fromisoformat(str(date.today() - timedelta(days=1)))

# function to retrieve Streaming Body from S3 with timedelta argument
def get_object(file_name):
    try:
        obj = file_name.get(IfModifiedSince=yesterday)
        return obj['Body']
    except:
        False


# obtain a list of s3 Objects with prefix filter
files = list(bucket.objects.filter(Prefix=prefix))

# Iterating through the list of files
# Loading streaming body into a file with the same name
# Printing file name and saving file
# Note skipping first file since it's only the directory

for file in files[1:]:
    file_name = file.key.split(prefix)[1] # getting the file name of the S3 object
    s3_file = get_object(file) # streaming body needing to iterate through
    if s3_file: # meets the modified by date
        print(file_name) # prints files not modified since timedelta
        try:
            with io.FileIO(file_name, 'w') as f:
                for i in s3_file: # iterating though streaming body
                    f.write(i)
        except TypeError as e:
            print(e, file)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM