[英]Download latest file in S3 bucket folder
I am writing a Python script to download the latest file from inside an S3 Bucket's folder.我正在编写一个 Python 脚本来从 S3 存储桶的文件夹中下载最新文件。 I understand how to download the latest file object from my S3 Bucket, however the files I want to download are in a folder inside the bucket.我了解如何从我的 S3 存储桶下载最新的文件对象,但是我要下载的文件位于存储桶内的文件夹中。 I am at a complete loss on how to do it and where it may be added within my code.我完全不知道如何去做以及它可以添加到我的代码中的什么地方。 I tried putting the path at the end of my bucket link but that did not seem to work.我尝试将路径放在存储桶链接的末尾,但这似乎不起作用。
# AWS Credentials
client = boto3.client('athena',aws_access_key_id=aws_server_access_key, aws_secret_access_key=aws_server_secret_key,region_name='us-east-1')
ba = boto3.client('s3',aws_access_key_id=aws_server_access_key, aws_secret_access_key=aws_server_secret_key,region_name='us-east-1')
# Get latest modified file
get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
objs = ba.list_objects_v2(Bucket=BUCKET_NAME)['Contents']
last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified)][0]
s3 = boto3.resource('s3', aws_access_key_id= aws_server_access_key,aws_secret_access_key= aws_server_secret_key)
try:
s3.Bucket(BUCKET_NAME).download_file(last_added, last_added)
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
Credits are in comments, this is just small modification to earlier answer.学分在评论中,这只是对先前答案的小修改。
def download_latest_in_dir(prefix, local, bucket, client=boto3.client('s3'), nLatest=2):
"""
from https://stackoverflow.com/questions/31918960/boto3-to-download-all-files-from-a-s3-bucket/31929277
params:
- prefix: pattern to match in s3
- local: local path to folder in which to place files
- bucket: s3 bucket with target contents
- client: initialized s3 client object
- nLatest: number of the most recent files to fetch from aws
Example: download two latest files from aws directory ieee-temp/sst to local directory /home/hu-mka/Downloads/sst
download_latest_in_dir(prefix='sst', local='/home/hu-mka/Downloads', bucket='ieee-temp', client=boto3.client('s3'), nLatest=2)
"""
files = []
times = []
dirs = []
next_token = ''
base_kwargs = {
'Bucket':bucket,
'Prefix':prefix,
}
ipage = 0
while next_token is not None:
kwargs = base_kwargs.copy()
if next_token != '':
kwargs.update({'ContinuationToken': next_token})
results = client.list_objects_v2(**kwargs)
contents = results.get('Contents')
for i in contents:
k = i.get('Key')
if k[-1] != '/':
files.append(k)
t = i.get('LastModified')
times.append(t)
else:
print(f"Warning: there was a sub direcotory which we omit: {k}")
#dirs.append(k)
print(f"A page read {ipage}, last item: {files[-1]}, its time stamp:{times[-1]}")
next_token = results.get('NextContinuationToken')
ipage += 1
#if ipage > 2:
# break
# https://stackoverflow.com/questions/6618515/sorting-list-based-on-values-from-another-list
time_sorted_filenames = [x for _, x in sorted(zip(times, files))]
#print(time_sorted_filenames)
for k in time_sorted_filenames[-nLatest:]:
dest_pathname = os.path.join(local, k)
if not os.path.exists(os.path.dirname(dest_pathname)):
os.makedirs(os.path.dirname(dest_pathname))
client.download_file(bucket, k, dest_pathname)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.