![](/img/trans.png)
[英]Is there a way to merge multiple CSV files uploaded to AWS S3 bucket using Python?
[英]download multiple files which were last uploaded or today uploaded files from s3 bucket using python code
我正在嘗試使用python代碼從s3存儲桶中下載最后上傳的文件或今天上傳的文件。 使用下面的代碼不想下載所有文件嗎?
#!/usr/bin/env python
import boto
import sys, os
from boto.s3.key import Key
from boto.exception import S3ResponseError
DOWNLOAD_LOCATION_PATH = os.path.expanduser("~") + "/s3-backup/"
if not os.path.exists(DOWNLOAD_LOCATION_PATH):
print ("Making download directory")
os.mkdir(DOWNLOAD_LOCATION_PATH)
def backup_s3_folder():
BUCKET_NAME = "xxxx"
AWS_ACCESS_KEY_ID= os.getenv("xxxxx") # set your AWS_KEY_ID on your environment path
AWS_ACCESS_SECRET_KEY = os.getenv("xxxxxx") # set your AWS_ACCESS_KEY on your environment path
conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_ACCESS_SECRET_KEY)
bucket = conn.get_bucket(BUCKET_NAME)
#goto through the list of files
bucket_list = bucket.list()
for l in bucket_list:
key_string = str(l.key)
s3_path = DOWNLOAD_LOCATION_PATH + key_string
try:
print ("Current File is ", s3_path)
l.get_contents_to_filename(s3_path)
except (OSError,S3ResponseError) as e:
pass
# check if the file has been downloaded locally
if not os.path.exists(s3_path):
try:
os.makedirs(s3_path)
except OSError as exc:
# let guard againts race conditions
import errno
if exc.errno != errno.EEXIST:
raise
if __name__ == '__main__':
backup_s3_folder()
據我了解,您只想檢索“今天”上載的文件,到今天,我的意思是同一天根據os上的日期時間執行此代碼。 您可以通過在foreach語句上添加條件以評估每個元素的最后修改屬性來實現此目的,因此它將遵循以下原則:
#!/usr/bin/env python
import boto
import sys, os
from boto.s3.key import Key
from boto.exception import S3ResponseError
#Get today's date
date_today = datetime.datetime.today().replace(hour=0, minute=0, second=0, microsecond=0)
DOWNLOAD_LOCATION_PATH = os.path.expanduser("~") + "/s3-backup/"
if not os.path.exists(DOWNLOAD_LOCATION_PATH):
print ("Making download directory")
os.mkdir(DOWNLOAD_LOCATION_PATH)
def backup_s3_folder():
BUCKET_NAME = "xxxx"
AWS_ACCESS_KEY_ID= os.getenv("xxxxx") # set your AWS_KEY_ID on your environment path
AWS_ACCESS_SECRET_KEY = os.getenv("xxxxxx") # set your AWS_ACCESS_KEY on your environment path
conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_ACCESS_SECRET_KEY)
bucket = conn.get_bucket(BUCKET_NAME)
#goto through the list of files
bucket_list = bucket.list()
for l in bucket_list:
key_string = str(l.key)
s3_path = DOWNLOAD_LOCATION_PATH + key_string
try:
if boto.utils.parse_ts(l.last_modified) > date_today:
print ("Current File is ", s3_path)
l.get_contents_to_filename(s3_path)
except (OSError,S3ResponseError) as e:
pass
# check if the file has been downloaded locally
if not os.path.exists(s3_path):
try:
os.makedirs(s3_path)
except OSError as exc:
# let guard againts race conditions
import errno
if exc.errno != errno.EEXIST:
raise
if __name__ == '__main__':
backup_s3_folder()
所有這些操作就是獲取今天的午夜日期,並檢查s3文件的最后修改屬性是否大於該值(因此該文件已“今天”上傳/修改)。
請記住,在檢索上一個修改后的屬性時,可能需要對時區進行一些處理,請確保將其轉換為運行此腳本的系統所在的時區!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.