简体   繁体   中英

Uploading a file from databricks dbfs / local to an S3 bucket. How do i upload a file from databricks to S3 bucket using boto3 library or mounting s3?

I have tried the following number of ways to upload my file in S3 which ultimately results in not storing the data but the path of the data.

import boto3
s3 = boto3.resource('s3')

OR

s3 = boto3.client(
    's3',
    aws_access_key_id="key_id",
    aws_secret_access_key="access_key")

s3.Object('bucket/folder/','xyz.csv').upload_file(Filename='/mnt/folder/xyz.csv')

--> Gives me an error FileNotFoundError: [Errno 2] No such file or directory: '/mnt/folder/xyz.csv'


s3.put_object(Body='/databricks/driver/xyz.csv', Bucket='bucket', Key='folder/xyz.csv')

--> Successfully executes it but when opened the file contains nothing but this string - '/databricks/driver/xyz.csv'


s3.Object('bucket', 'folder/xyz.csv').put(Body="/FileStore/folder/xyz.csv")

--> Successfully executes it but when opened the file contains nothing but this string - '/FileStore/folder/xyz.csv'


bucket = s3.Bucket('bucket')
s3.Object('bucket/folder', 'xyz.csv').put(Body=open('/FileStore/folder/xyz.csv', 'rb'))

--> Gives me an error FileNotFoundError: [Errno 2] No such file or directory: '/mnt/folder/xyz.csv'


with open('/mnt/folder/xyz.csv', "rb") as f:
    s3.upload_fileobj(f, 'bucket', 'folder/xyz.csv')

--> Gives me an error FileNotFoundError: [Errno 2] No such file or directory: '/mnt/folder/xyz.csv'


s3.meta.client.upload_file('/mnt/folder/xyz.csv', 'bucket', 'folder/xyz.csv')

--> Gives me an error FileNotFoundError: [Errno 2] No such file or directory: '/mnt/folder/xyz.csv'


Kindly let me know if there's any typo or grammatical mistakes, or if need to change the structure of the question. Thanks!

Below code worked fine for me

import boto3
from botocore.client import Config
ACCESS_KEY = 'YOUR_ACCESS_KEY'
SECRET_KEY = 'YOUR_SECRET_KEY'
AWS_BUCKET_NAME = "BUCKET_NAME"

s3 = boto3.resource('s3', aws_access_key_id = ACCESS_KEY, aws_secret_access_key 
=SECRET_KEY, config = Config(signature_version = 's3v4') )

s3.meta.client.upload_file( '/dbfs/FileStore/filename.csv', AWS_BUCKET_NAME, 
"filename.csv")

I have found the answer to my Question -

  1. Instead of using put_object I've used upload_file()

  2. Secondly when reading from a dbfs(databricks file system) always prefix the folder structure name with "/dbfs"

Remember forward slash (/) is important


import boto3
s3_client = boto3.client('s3')
response = s3_client.upload_file('/dbfs/FileStore/folder/'xyz.csv', 'bucket', 'folder/'xyz.csv' )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM