简体繁体 English

访问 AWS SageMaker 笔记本中的实例存储

[英]Accessing instance storage in AWS SageMaker notebooks

原文 2023-01-04 17:59:59 3 1 amazon-web-services/ amazon-ec2/ amazon-sagemaker/ amazon-efs/ amazon-sagemaker-studio

I'm trying to train a model using AWS SageMaker notebooks and am disappointed with how slowly the model is training.我正在尝试使用 AWS SageMaker 笔记本训练 model，但我对 model 的训练速度很慢感到失望。 I think my bottleneck lies with the IOPS speed to the persistent storage (EFS and EBS) my SageMaker notebooks are accessing for the dataset.我认为我的瓶颈在于我的 SageMaker 笔记本为数据集访问的持久性存储（EFS 和 EBS）的 IOPS 速度。

First, I tried training on a SageMaker Studio ml.g4dn.xlarge instance, then moved everything over to a SageMaker notebook ml.g4dn.xlarge instance through Jupyter.首先，我尝试在 SageMaker Studio ml.g4dn.xlarge 实例上进行训练，然后通过 Jupyter 将所有内容移至 SageMaker notebook ml.g4dn.xlarge 实例。 Even though g4dn.xlarge instances come with a physically wired 125GB SSD, I'm unable to access it because SageMaker Studio automatically creates an EFS store, and SageMaker notebook instances automatically create an EBS store.尽管 g4dn.xlarge 实例带有物理连接的 125GB SSD，但我无法访问它，因为 SageMaker Studio 会自动创建一个 EFS 存储，而 SageMaker 笔记本实例会自动创建一个 EBS 存储。 How could I store my dataset on the 125GB SSD instead of EFS or EBS to speed up the IOPS?我如何将我的数据集存储在 125GB SSD 而不是 EFS 或 EBS 上以加快 IOPS？

1 个解决方案

It is clear that there are instances with memory optimised for large amounts of data.很明显，有memory 的实例针对大量数据进行了优化。 In your case, if the dataset is given as input to the model with exactly that size (so there is no upstream preprocessing to lighten this amount of data), you must know that the g4dn is EBS optimised .在您的情况下，如果数据集作为输入提供给 model 且大小正好相同（因此没有上游预处理来减轻这一数据量），您必须知道g4dn 是 EBS optimized 。

The most obvious answer i can think of is to use an S3 bucket我能想到的最明显的答案是使用 S3 存储桶

From " Maximum transfer speed between Amazon EC2 and Amazon S3 ":来自“ Amazon EC2 和 Amazon S3 之间的最大传输速度”：

Traffic between Amazon EC2 and Amazon S3 can leverage up to 100 Gbps of bandwidth to VPC endpoints and public IPs in the same region. Amazon EC2 和 Amazon S3 之间的流量可以利用高达 100 Gbps 的带宽连接到同一区域中的 VPC 终端节点和公共 IP。

Besides being very fast and performant, it is also the best solution in terms of design for all components of your project on AWS.除了非常快速和高性能之外，就 AWS 上项目的所有组件的设计而言，它也是最佳解决方案。 Clearly, it entails different costs and a different architecture, but you will enjoy the maximum speed that the set of AWS services can offer you (and possibly require special configurations for even better performance).显然，它需要不同的成本和不同的架构，但您将享受到 AWS 服务集可以为您提供的最大速度（并且可能需要特殊配置以获得更好的性能）。

My advice is to follow the AWS guidelines for developing a complex project from scratch: Build, training and deployment of machine learning models .我的建议是遵循AWS 从头开始开发复杂项目的指南：构建、训练和部署机器学习模型。