简体   繁体   English

使用 boto python 从 s3 递归下载文件。

[英]Downloading the files from s3 recursively using boto python.

I have a bucket in s3, which has deep directory structure.我在 s3 中有一个存储桶,它具有很深的目录结构。 I wish I could download them all at once.我希望我可以一次下载它们。 My files look like this :我的文件如下所示:

foo/bar/1. . 
foo/bar/100 . . 

Are there any ways to download these files recursively from the s3 bucket using boto lib in python?有什么方法可以使用 python 中的 boto lib 从 s3 存储桶递归下载这些文件?

Thanks in advance.提前致谢。

You can download all files in a bucket like this (untested):您可以像这样下载存储桶中的所有文件(未经测试):

from boto.s3.connection import S3Connection

conn = S3Connection('your-access-key','your-secret-key')
bucket = conn.get_bucket('bucket')
for key in bucket.list():
    try:
        res = key.get_contents_to_filename(key.name)
    except:
        logging.info(key.name+":"+"FAILED")

Keep in mind that folders in S3 are simply another way of writing the key name and only clients will show this as folders.请记住,S3 中的文件夹只是编写密钥名称的另一种方式,只有客户端才会将其显示为文件夹。

#!/usr/bin/env python

import boto
import sys, os
from boto.s3.key import Key
from boto.exception import S3ResponseError


DOWNLOAD_LOCATION_PATH = os.path.expanduser("~") + "/s3-backup/"
if not os.path.exists(DOWNLOAD_LOCATION_PATH):
    print ("Making download directory")
    os.mkdir(DOWNLOAD_LOCATION_PATH)


def backup_s3_folder():
    BUCKET_NAME = "your-bucket-name"
    AWS_ACCESS_KEY_ID= os.getenv("AWS_KEY_ID") # set your AWS_KEY_ID  on your environment path
    AWS_ACCESS_SECRET_KEY = os.getenv("AWS_ACCESS_KEY") # set your AWS_ACCESS_KEY  on your environment path
    conn  = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_ACCESS_SECRET_KEY)
    bucket = conn.get_bucket(BUCKET_NAME)

    #goto through the list of files
    bucket_list = bucket.list()

    for l in bucket_list:
        key_string = str(l.key)
        s3_path = DOWNLOAD_LOCATION_PATH + key_string
        try:
            print ("Current File is ", s3_path)
            l.get_contents_to_filename(s3_path)
        except (OSError,S3ResponseError) as e:
            pass
            # check if the file has been downloaded locally  
            if not os.path.exists(s3_path):
                try:
                    os.makedirs(s3_path)
                except OSError as exc:
                    # let guard againts race conditions
                    import errno
                    if exc.errno != errno.EEXIST:
                        raise




if __name__ == '__main__':
    backup_s3_folder()
import boto, os

LOCAL_PATH = 'tmp/'

AWS_ACCESS_KEY_ID = 'YOUUR_AWS_ACCESS_KEY_ID'
AWS_SECRET_ACCESS_KEY = 'YOUR_AWS_SECRET_ACCESS_KEY'
bucket_name = 'your_bucket_name'

# connect to the bucket
conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket(bucket_name)

# go through the list of files
bucket_list = bucket.list()
for l in bucket_list:
  keyString = str(l.key)
  d = LOCAL_PATH + keyString
  try:
    l.get_contents_to_filename(d)
  except OSError:
    # check if dir exists
    if not os.path.exists(d):
      os.makedirs(d)  # Creates dirs recurcivly 

Just added directory creation part to @j0nes comment刚刚在@j0nes 评论中添加了目录创建部分

from boto.s3.connection import S3Connection
import os

conn = S3Connection('your-access-key','your-secret-key')
bucket = conn.get_bucket('bucket')

for key in bucket.list():
    print key.name
    if key.name.endswith('/'):
        if not os.path.exists('./'+key.name):
            os.makedirs('./'+key.name)
    else:
        res = key.get_contents_to_filename('./'+key.name)

This will download files to current directory and will create directories when needed.这会将文件下载到当前目录,并在需要时创建目录。

if you have more than 1000 files in the folder you need to use a paginator to iterate through them如果文件夹中有超过 1000 个文件,则需要使用分页器遍历它们

import boto3
import os
# create the client object
client = boto3.client(
's3',
aws_access_key_id= S3_ACCESS_KEY,
aws_secret_access_key=  S3_SECRET_KEY
)
# bucket and folder urls
bucket= 'bucket-name'
data_key = 'key/to/data/'

paginator = client.get_paginator("list_objects_v2")
for page in paginator.paginate(Bucket=bucket, Prefix=data_dir):
    for obj in page['Contents']:
        key = obj['Key']
        tmp_dir =  '/'.join(key.split('/')[0:-1])
        if not os.path.exists('/'.join(key.split('/')[0:-1])):
            os.makedirs(tmp_dir)
        else:
            client.download_file(bucket, key, tmp_dir + key.split('/')[-1])
import boto
from boto.s3.key import Key

keyId = 'YOUR_AWS_ACCESS_KEY_ID'
sKeyId='YOUR_AWS_ACCESS_KEY_ID'
bucketName='your_bucket_name'

conn = boto.connect_s3(keyId,sKeyId)
bucket = conn.get_bucket(bucketName)
for key in bucket.list():
    print ">>>>>"+key.name
    pathV = key.name.split('/')
    if(pathV[0] == "data"):
        if(pathV[1] != ""):
            srcFileName = key.name
            filename = key.name
            filename = filename.split('/')[1]
            destFileName = "model/data/"+filename
            k = Key(bucket,srcFileName)
            k.get_contents_to_filename(destFileName)
    elif(pathV[0] == "nlu_data"):
        if(pathV[1] != ""):
            srcFileName = key.name
            filename = key.name
            filename = filename.split('/')[1]
            destFileName = "model/nlu_data/"+filename
            k = Key(bucket,srcFileName)
            k.get_contents_to_filename(destFileName`

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我们可以使用boto3 Python在aws s3存储桶之间递归复制文件和文件夹吗? - Can we copy the files and folders recursively between aws s3 buckets using boto3 Python? 从 boto2 下载 S3 存储桶中的子文件夹文件 - Downloading subfolders files in S3 bucket from boto2 使用Python boto在s3中存储文件 - storing files in s3 using Python boto 从s3递归复制数据,使用python boto3更改文件夹结构 - recursively copying data from s3 changing the folder structure using python boto3 在python中使用boto递归地将内容从一个路径复制到另一个s3存储桶 - Recursively copying Content from one path to another of s3 buckets using boto in python 使用 boto3 (python) 将文件从 s3 下载到本地机器 - Downloading a file from s3 to local machine using boto3 (python) 如何使用 python boto3 将文件和文件夹从一个 S3 存储桶复制到另一个 S3 - how to copy files and folders from one S3 bucket to another S3 using python boto3 Python / Boto 3:如何从AWS S3检索/下载文件? - Python/ Boto 3: How to retrieve/download files from AWS S3? 使用python和boto根据时间戳在S3中处理文件 - process the files in S3 based on their timestamp using python and boto Boto3-在S3中将文件从一个文件夹递归复制到另一个文件夹 - Boto3 - Recursively copy files from one folder to another folder in S3
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM