简体   繁体   中英

Changing the working directory to a S3 Bucket on AWS

Currently I am working on a SageMaker notebook instance and trying to change my working directory to an AWS S3 bucket. I am using the following code:

os.chdir('s3://bucket-name')

The error generated says: FileNotFoundError: [Errno 2] No such file or directory: 's3://bucket-name' but I used the below code to upload a CSV file and it works:

import boto3
import pandas as pd
from sagemaker import get_execution_role

role = get_execution_role()
bucket='bucket-name'
data_key = 'some_file.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)

df = pd.read_csv(data_location)

How can I change the working directory to an S3 bucket?

Guy is right, as usual ;)

If you still want to use pandas, you need to install the s3fs library ('pip install s3fs'), and then you can do this:

import pandas as pd
my_file = pd.read_csv('s3://my_bucket/my_prefix/my_file.csv')

S3 is not a file system and you can't just change directory to it. Many of the libraries such as Pandas can read and write directly from S3, but it requires specific libraries to make it work.

The simplest option is to copy the files from S3 to the local drive (EBS or EFS) of the notebook instance:

aws s3 cp s3://bucket_name/some_file.csv data/

The AWS CLI is already installed on the notebook instance, and if you gave the right IAM permission when you launched your notebook instance, then the copy command should work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM