简体   繁体   English

返回在过去24小时内修改过的Amazon S3存储桶中的所有密钥

[英]Return all keys from an Amazon S3 bucket which have been modified in the past 24 hours

Here is the situation: I work remotely and most days people are adding datasets to the our Amazon S3 instance. 情况如下:我远程工作,大多数时候人们都在向我们的Amazon S3实例添加数据集。 Each of these datasets require some very similar processing tasks, which I am able to automate with some pretty simple python. 这些数据集中的每一个都需要一些非常类似的处理任务,我可以使用一些非常简单的python自动执行这些任务。 However, I cannot seem to isolate the datasets that have been added to the S3 in the past 24 hours using the modified date. 但是,我似乎无法使用修改日期隔离过去24小时内添加到S3的数据集。 Here is what I have so far: 这是我到目前为止:

import boto3 
from boto3.session import Session
ACCESS_KEY = xxxx
SECRET_KEY = xxxx
session = Session(aws_access_key_id=ACCESS_KEY, 
aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')
s3_client = boto3.client('s3')
def get_all_s3_keys(bucket):
    keys = []
    kwargs = {'Bucket': bucket}
    while True:
        resp = s3_client.list_objects_v2(**kwargs)
        for obj in resp['Contents']:
            keys.append(obj['Key'])
        try:
            kwargs['ContinuationToken'] = resp['NextContinuationToken']
        except KeyError:
            break
    return keys

bucket_keys = get_all_s3_keys('mybucket')
recnt_keys = [key for key in bucket_keys if 'Temp' in key]

This will return all keys in 'mybucket' containing the word "Temp", but this obviously doesn't help me with the modified date. 这将返回包含单词“Temp”的'mybucket'中的所有键,但这显然不能帮助我修改日期。 Once I get the list of recently modified keys, I want to be able to iterate through and download them to a predetermined local path. 一旦我得到最近修改过的密钥列表,我希望能够迭代并将它们下载到预定的本地路径。

Any thoughts? 有什么想法吗?

Thanks 谢谢

Try this snippet (just get all items and then filter): 试试这个片段(只需获取所有项目然后过滤):

import boto3
import datetime

s3 = boto3.resource('s3')
s3_bucket = s3.Bucket('mybucket')
items = [item for item in s3_bucket.objects.filter()] # get them all
now = datetime.datetime.now(datetime.timezone.utc)
td = datetime.timedelta(hours=24)
last_24_hours_keys = [item.key for item in items if now - item.last_modified < td] # filter

HTH. HTH。

Wow! 哇! Thanks for the advice @Matt Messersmith. 感谢@Matt Messersmith的建议。 I am using Python 2 (dang Esri python installation - but I need arcpy). 我正在使用Python 2(dang Esri python安装 - 但我需要arcpy)。 I will add the slight adjustments for python 2 below. 我将为下面的python 2添加稍微调整。 Had to use pytz instead of datetime.timezone.utc. 不得不使用pytz而不是datetime.timezone.utc。

s3 = boto3.resource('s3')  s3_bucket = s3.Bucket('bucket') 
items = [item for item in s3_bucket.objects.filter()]  
now = datetime.datetime.now(pytz.utc)  
td = datetime.timedelta(hours=24)  
last_24_hours_keys = [item.key for item in items if now - item.last_modified < td]  
print last_24_hours_keys

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM