Here is the situation: I work remotely and most days people are adding datasets to the our Amazon S3 instance. Each of these datasets require some very similar processing tasks, which I am able to automate with some pretty simple python. However, I cannot seem to isolate the datasets that have been added to the S3 in the past 24 hours using the modified date. Here is what I have so far:
import boto3
from boto3.session import Session
ACCESS_KEY = xxxx
SECRET_KEY = xxxx
session = Session(aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')
s3_client = boto3.client('s3')
def get_all_s3_keys(bucket):
keys = []
kwargs = {'Bucket': bucket}
while True:
resp = s3_client.list_objects_v2(**kwargs)
for obj in resp['Contents']:
keys.append(obj['Key'])
try:
kwargs['ContinuationToken'] = resp['NextContinuationToken']
except KeyError:
break
return keys
bucket_keys = get_all_s3_keys('mybucket')
recnt_keys = [key for key in bucket_keys if 'Temp' in key]
This will return all keys in 'mybucket' containing the word "Temp", but this obviously doesn't help me with the modified date. Once I get the list of recently modified keys, I want to be able to iterate through and download them to a predetermined local path.
Any thoughts?
Thanks
Try this snippet (just get all items and then filter):
import boto3
import datetime
s3 = boto3.resource('s3')
s3_bucket = s3.Bucket('mybucket')
items = [item for item in s3_bucket.objects.filter()] # get them all
now = datetime.datetime.now(datetime.timezone.utc)
td = datetime.timedelta(hours=24)
last_24_hours_keys = [item.key for item in items if now - item.last_modified < td] # filter
HTH.
Wow! Thanks for the advice @Matt Messersmith. I am using Python 2 (dang Esri python installation - but I need arcpy). I will add the slight adjustments for python 2 below. Had to use pytz instead of datetime.timezone.utc.
s3 = boto3.resource('s3') s3_bucket = s3.Bucket('bucket')
items = [item for item in s3_bucket.objects.filter()]
now = datetime.datetime.now(pytz.utc)
td = datetime.timedelta(hours=24)
last_24_hours_keys = [item.key for item in items if now - item.last_modified < td]
print last_24_hours_keys
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.