I want to download all the csv files that exist in s3 folder(2021-02-15). I tried the following, but it failed. How can I do it?
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket')
key = 'product/myproject/2021-02-15/'
objs = list(bucket.objects.filter(Prefix=key))
for obj in objs:
client = boto3.client('s3')
client.download_file(bucket, obj, obj)
valueError: Filename must be a string
Marcin answer is correct but files with the same name in different paths would be overwritten. You can avoid that by replicating the folder structure of the S3 bucket locally.
import boto3
import os
from pathlib import Path
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket')
key = 'product/myproject/2021-02-15/'
objs = list(bucket.objects.filter(Prefix=key))
for obj in objs:
# print(obj.key)
# remove the file name from the object key
obj_path = os.path.dirname(obj.key)
# create nested directory structure
Path(obj_path).mkdir(parents=True, exist_ok=True)
# save file with full path locally
bucket.download_file(obj.key, obj.key)
Filter returns a collection object and not just name whereas the download_file()
method is expecting the object name:
Try this:
objs = list(bucket.objects.filter(Prefix=key))
client = boto3.client('s3')
for obj in objs:
client.download_file(bucket, obj.name, obj.name)
You could also use print(obj)
to print the obj
object in the loop to see what it actually has.
Since you are using resource
, youu can use download_file :
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket')
key = 'product/myproject/2021-02-15/'
objs = list(bucket.objects.filter(Prefix=key))
for obj in objs:
#print(obj.key)
out_name = obj.key.split('/')[-1]
bucket.download_file(obj.key, out_name)
You could also use cloudpathlib
which, for S3, wraps boto3
. For your use case, it's pretty simple:
from cloudpathlib import CloudPath
cp = CloudPath("s3://bucket/product/myproject/2021-02-15/")
cp.download_to("local_folder")
Following the accepted answer and using your example of key, I get the following error:
NotADirectoryError: [Errno 20] Not a directory: 'product/myproject/2021-02-15/.d6Ac4540d' -> 'product/myproject/2021-02-15/'
Do you know how to solve this issue? Thanks !
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.