How to upload large number of files to Amazon S3 efficiently using boto3?

Question

I have 10000s of 10Mb files in my local directory and I'm trying to upload it to a bucket in Amazon S3 using boto3 by sequential upload approach. The only problem I'm facing here is it takes lot of time to upload large number of files to S3. I want to know like whether there are efficient ways(using multithreading or multiprocessing) to upload files to Amazon S3?

 root_path ="/home/shivraj/folder/" path = root_path+'folder_raw/' # use your path dest_path = root_path+'folder_parsed/' backup_path = root_path+'folder_backup/' def parse_ivn_files(): src_files_list = glob.glob(path + "*.txt.zip") # .log files in the path files try: if src_files_list: for file_ in src_files_list: df = pd.read_csv(file_,compression="zip",sep="|", header=None) file = file_.replace(path,'') file_name = file.replace(".txt.zip",'') df.columns=["Date","Time","System_Event","Event_Type","Event_sub_type","Latitude","Longitude","Field_1","Field_2","Field_3","Field_4","Event_Number","Event_Description"] new_df=df['Event_Description'].str.split(',',expand=True) large_df = pd.concat([df,new_df],axis=1) large_df.to_csv(dest_path+file_name+".csv",index=False) s3.meta.client.upload_file(dest_path+file_name+".csv", 's3-bucket-name-here', 'ivn_parsed/'+file_name+".csv") s3.meta.client.upload_file(path+file_name+".txt.zip", 's3-bucket-name-here', 'ivn_raw_backup/'+file_name+"_bk.txt.zip") os.rename(path+file_name+".txt.zip", backup_path+file_name+"_bk.txt.zip") else: print("No files in the source folder") except: raise FileNotFoundError

Answer 1

I'd go for s4cmd - it's a nice tool that can upload your files in parallel and has solved some other problems too:

https://github.com/bloomreach/s4cmd

How to upload large number of files to Amazon S3 efficiently using boto3?

Question

1 answers

solution1
1 2018-02-26 23:59:28

How to upload large number of files to Amazon S3 efficiently using boto3?

Question

1 answers

solution1 1 2018-02-26 23:59:28

solution1
1 2018-02-26 23:59:28