简体   繁体   English

如何将图像数据从 s3 存储桶加载到 sagemaker 笔记本?

[英]How to load Image data from s3 bucket to sagemaker notebook?

I just started to use aws sagemaker.我刚开始使用 aws sagemaker。 I tried to import images from my s3 bucket to sagemaker notebook.我尝试将图像从我的 s3 存储桶导入 sagemaker 笔记本。 But I can't import images to the notebook.但我无法将图像导入笔记本。 my image location is s3://my_bucket/train how can I import the train folder from the given path to my sagemaker notebook.我的图像位置是s3://my_bucket/train如何将 train 文件夹从给定的路径导入到我的 sagemaker 笔记本。 I've gone through some of the solution in here and the solutions are for CSV file.我已经在这里完成了一些解决方案,这些解决方案适用于 CSV 文件。 All the images in my S3 bucket are in .jpeg format.我的 S3 存储桶中的所有图像都是 .jpeg 格式。

You don't have to download images from S3 bucket to local SageMaker instance for training the model. 您无需将图像从S3存储桶下载到本地SageMaker实例即可训练模型。 If you are trying to pull them for data exploration/analysis you can use aws cli from your SageMaker notebook. 如果要拉出它们进行数据探索/分析,则可以使用SageMaker笔记本中的aws cli You can use following command to download a sample image. 您可以使用以下命令下载示例图像。 This will copy sample.jpg to images directory in your pwd . 这会将sample.jpg复制到pwd images目录。

aws s3 cp s3://my_bucket/train/sample.jpg ./images/sample.jpg

Try looking at amazon-sagemaker-examples repo to learn how to work with image formats on SageMaker. 尝试查看amazon-sagemaker-examples回购,以了解如何在SageMaker上使用图像格式。

Which SageMaker sample notebook(s) are you trying to use? 您要使用哪个SageMaker示例笔记本?

If your input data is in S3 bucket, you don't have to download it to your SageMaker notebook instance. 如果输入数据在S3存储桶中,则不必将其下载到SageMaker笔记本实例。 This example shows that data is uploaded to S3 bucket: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/imageclassification_caltech/Image-classification-fulltraining.ipynb 此示例显示数据已上传到S3存储桶: https : //github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/imageclassification_caltech/Image-classification-fulltraining.ipynb

For training purposes, you can configure s3_train to your input data bucket. 为了进行培训,您可以将s3_train配置为输入数据存储桶。

And here's an inference on the image input format for the built-in image classification. 这是内置图像分类的图像输入格式的推论。

you could use s3fs to easily access your bucket as well as an image file in it. 您可以使用s3fs轻松访问存储桶以及其中的图像文件。

import s3fs

fs = s3fs.S3FileSystem()

# To List 5 files in your accessible bucket
fs.ls('s3://bucket-name/data/')[:5]

# open it directly
with fs.open(f's3://bucket-name/data/image.png') as f:
    display(Image.open(f))

The snippet of code below will help you load your image folder from your s3 bucket to your SageMaker instance;下面的代码片段将帮助您将图像文件夹从 s3 存储桶加载到 SageMaker 实例;


import boto3 
from botocore.exceptions import ClientError # Not necessary

# Remember to enter the cirrect bucket region below
s3 = boto3.resource('s3', region_name='us-west-2') 
# Replace the place holder with your correct bucket name
bucket = s3.Bucket('my_bucket') 
for my_bucket_object in bucket.objects.all():    
    key = my_bucket_object.key    
    print(key)    
    if not os.path.exists(os.path.dirname(key)):           
        os.makedirs(os.path.dirname(key))

# The following is basically for exception handling and not necessary to include     
    try:         
        bucket.download_file(key, key)     
    except ClientError as e:         
        if e.response['Error']['Code'] == "404":             
            print("No object with this key.")        
        else:             
            raise

Alternatively, you can also try running the following script from your Sagemaker notebook cell;或者,您也可以尝试从 Sagemaker 笔记本单元运行以下脚本;

!aws s3 cp s3://$my_bucket//train/images train/images/ --recursive

The solution is from this website .解决方案来自这个网站

Documentation on using cp . 有关使用cp文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM