简体   繁体   English

S3 从存储桶下载多个文件

[英]S3 multiple files download from a bucket

I have a S3 bucket with paths of the form {productId}/{store}/description.txt.我有一个路径格式为 {productId}/{store}/description.txt 的 S3 存储桶。 Here's what the bucket might look like at the top level这是存储桶在顶层的外观

ABC123/Store1/description.txt
ABC123/Store2/description.txt
ABC123/Store3/description.txt
DEF123/Store1/description.txt
DEF123/Store2/description.txt

If i had to read all the files pertaining to a certain product ID (for ex: ABC123) do I have to navigate into ABC123, list all folders and iterate over it for each store and download each file separately?如果我必须阅读与某个产品 ID(例如:ABC123)相关的所有文件,我是否必须导航到 ABC123,列出所有文件夹并为每个商店迭代它并分别下载每个文件? Or is there a way I can do this with a single API call?或者有没有办法通过单个 API 调用来做到这一点?

PS: I need to do this programmatically PS:我需要以编程方式执行此操作

With boto3 you can use filtering and you have to iterate .使用boto3您可以使用过滤,并且必须迭代

There are few ways of doing this, but I usually download the s3 objects in parallel .有几种方法可以做到这一点,但我通常并行下载 s3 对象。 For example:例如:

import boto3

from multiprocessing import Pool


session = boto3.Session()

s3r = session.resource('s3')

my_bucket = s3r.Bucket('your_bucket')

objects_to_download = []
for obj in my_bucket.objects.filter(Prefix='ABC123/'):    
        objects_to_download.append((my_bucket.name, obj.key))
    
#print(objects_to_download)

def s3_downloader(s3_object_tuple):
    my_bucket, my_object = s3_object_tuple
    s3_object = s3r.Object(my_bucket, my_object)
    out_file = my_object.replace('/', '_')
    print(f'Downloading s3://{my_bucket}/{my_object} to {out_file}')
    s3_object.download_file('/tmp/' + out_file)
    print(f'Downloading finished s3://{my_bucket}/{my_object}')
    
with Pool(5) as p:
    p.map(s3_downloader, objects_to_download)

I believe it is a limitation of the AWS console web interface, having tried (and failed) to do this myself.我相信这是 AWS 控制台 Web 界面的限制,我自己尝试过(但失败了)。

Alternatively, perhaps use a 3rd party S3 browser client such as http://s3browser.com/或者,也许使用第 3 方 S3 浏览器客户端,例如http://s3browser.com/

If you have Visual Studio with the AWS Explorer extension installed, you can also browse to Amazon S3 (step 1), select your bucket (step 2), select all the files you want to download (step 3) and right-click to download them all (step 4).如果您安装了带有 AWS Explorer 扩展的 Visual Studio,您还可以浏览到 Amazon S3(第 1 步),选择您的存储桶(第 2 步),选择您要下载的所有文件(第 3 步),然后右键单击下载它们全部(第 4 步)。

在此处输入图片说明

The S3 service has no meaningful limits on simultaneous downloads (easily several hundred downloads at a time are possible) and there is no policy setting related to this... but the S3 console only allows you to select one file for downloading at a time. S3 服务对同时下载没有有意义的限制(一次可以轻松下载数百次),并且没有与此相关的策略设置……但 S3 控制台只允许您一次选择一个文件进行下载。

Once the download starts, you can start another and another, as many as your browser will let you attempt simultaneously.下载开始后,您可以开始一个又一个,与您的浏览器允许您同时尝试的数量一样多。

In case someone is still looking for an S3 browser and downloader I have just tried Filezilla Pro (it's a paid version).如果有人仍在寻找 S3 浏览器和下载器,我刚刚尝试了 Filezilla Pro(它是付费版本)。 It worked great.它工作得很好。

I created a connection to S3 with the Access key and secret key set up via IAM.我使用通过 IAM 设置的访问密钥和秘密密钥创建了到 S3 的连接。 The connection was instant and downloading all folders and files was fast.连接是即时的,下载所有文件夹和文件的速度很快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM