繁体   English   中英

下载 S3 存储桶文件夹中的最新文件

[英]Download latest file in S3 bucket folder

我正在编写一个 Python 脚本来从 S3 存储桶的文件夹中下载最新文件。 我了解如何从我的 S3 存储桶下载最新的文件对象,但是我要下载的文件位于存储桶内的文件夹中。 我完全不知道如何去做以及它可以添加到我的代码中的什么地方。 我尝试将路径放在存储桶链接的末尾,但这似乎不起作用。

# AWS Credentials 
client = boto3.client('athena',aws_access_key_id=aws_server_access_key, aws_secret_access_key=aws_server_secret_key,region_name='us-east-1')
ba = boto3.client('s3',aws_access_key_id=aws_server_access_key, aws_secret_access_key=aws_server_secret_key,region_name='us-east-1')
# Get latest modified file
get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))

objs = ba.list_objects_v2(Bucket=BUCKET_NAME)['Contents']
last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified)][0]

s3 = boto3.resource('s3', aws_access_key_id= aws_server_access_key,aws_secret_access_key= aws_server_secret_key)
try:
    s3.Bucket(BUCKET_NAME).download_file(last_added, last_added)
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] == "404":
        print("The object does not exist.")
    else:
        raise

学分在评论中,这只是对先前答案的小修改。

def download_latest_in_dir(prefix, local, bucket, client=boto3.client('s3'), nLatest=2):
    """
    from https://stackoverflow.com/questions/31918960/boto3-to-download-all-files-from-a-s3-bucket/31929277

    params:
    - prefix: pattern to match in s3
    - local: local path to folder in which to place files
    - bucket: s3 bucket with target contents
    - client: initialized s3 client object
    - nLatest: number of the most recent files to fetch from aws

    Example: download two latest files from aws directory ieee-temp/sst to local directory /home/hu-mka/Downloads/sst
    download_latest_in_dir(prefix='sst', local='/home/hu-mka/Downloads', bucket='ieee-temp', client=boto3.client('s3'), nLatest=2)
    """
    files = []
    times = []
    dirs = []
    next_token = ''
    base_kwargs = {
        'Bucket':bucket,
        'Prefix':prefix,
    }
    ipage = 0
    while next_token is not None:
        kwargs = base_kwargs.copy()
        if next_token != '':
            kwargs.update({'ContinuationToken': next_token})
        results = client.list_objects_v2(**kwargs)
        contents = results.get('Contents')
        for i in contents:
            k = i.get('Key')
            if k[-1] != '/':
                files.append(k)
                t = i.get('LastModified')
                times.append(t)
            else:
                print(f"Warning: there was a sub direcotory which we omit: {k}")
                #dirs.append(k)
        print(f"A page read {ipage}, last item: {files[-1]}, its time stamp:{times[-1]}")
        next_token = results.get('NextContinuationToken')
        ipage += 1
        #if ipage > 2:
        #    break
    # https://stackoverflow.com/questions/6618515/sorting-list-based-on-values-from-another-list
    time_sorted_filenames = [x for _, x in sorted(zip(times, files))]
    #print(time_sorted_filenames)
    for k in time_sorted_filenames[-nLatest:]:
        dest_pathname = os.path.join(local, k)
        if not os.path.exists(os.path.dirname(dest_pathname)):
            os.makedirs(os.path.dirname(dest_pathname))
        client.download_file(bucket, k, dest_pathname)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM