简体繁体 English

访问 AWS ParallelCluster 上的 s3 存储桶

[英]Accessing s3 bucket on AWS ParallelCluster

原文 2020-07-15 04:12:06 3 1 amazon-web-services/ amazon-s3/ amazon-ec2

I have a requirement of accessing S3 bucket on the AWS ParallelCluster nodes.我需要访问 AWS ParallelCluster 节点上的 S3 存储桶。 I did explore the s3_read_write_resource option in the ParallelCluster documentation.我确实探索了 ParallelCluster 文档中的s3_read_write_resource选项。 But, it is not clear as to how we can access the bucket.但是，目前尚不清楚我们如何访问存储桶。 For example, will it be mounted on the nodes, or will the users be able to access it by default.例如，它是否会挂载在节点上，或者用户是否能够默认访问它。 I did test the latter by trying to access a bucket I declared using the s3_read_write_resource option in the config file, but was not able to access it ( aws s3 ls s3://<name-of-the-bucket> ).我确实通过尝试访问我在配置文件中使用s3_read_write_resource选项声明的存储桶来测试后者，但无法访问它（ aws s3 ls s3://<name-of-the-bucket> ）。

I did go through this github issue talking about mounting S3 bucket using s3fs.我通过这个github 问题做了 go 讨论使用 s3fs 安装 S3 存储桶。 In my experience it is very slow to access the objects using s3fs.根据我的经验，使用 s3fs 访问对象非常慢。

So, my question is,所以，我的问题是，

How can we access the S3 bucket when using s3_read_write_resource option in AWS ParallelCluster config file在 AWS ParallelCluster 配置文件中使用s3_read_write_resource选项时，我们如何访问 S3 存储桶

1 个解决方案

These parameters are used in ParallelCluster to include S3 permissions on the instance role that is created for cluster instances.这些参数在 ParallelCluster 中用于包含对为集群实例创建的实例角色的 S3 权限。 They're mapped into Cloudformation template parameters S3ReadResource and S3ReadWriteResource.它们被映射到 Cloudformation 模板参数S3ReadResource 和 S3ReadWriteResource。 And later used in the Cloudformation template.并且后来在 Cloudformation 模板中使用。 For example, here and here .例如，这里和这里。 There's no special way for accessing S3 objects.访问 S3 对象没有特殊的方法。

To access S3 on one cluster instance, we need to use the aws cli or any SDK.要在一个集群实例上访问 S3，我们需要使用 aws cli 或任何 SDK。 Credentials will be automatically obtained from the instance role using instance metadata service.凭证将使用实例元数据服务自动从实例角色获取。

Please note that ParallelCluster doesn't grant permissions to list S3 objects.请注意，ParallelCluster 不授予列出 S3 对象的权限。

Retrieving existing objects from S3 bucket defined in s3_read_resource, as well as retrieving and writing objects to S3 bucket defined in s3_read_write_resource should work.从 s3_read_resource 中定义的 S3 存储桶中检索现有对象，以及从 s3_read_write_resource 中定义的 S3 存储桶中检索和写入对象应该可以工作。

However, "aws s3 ls" or "aws s3 ls s3://name-of-the-bucket" need additional permissions.但是，“aws s3 ls”或“aws s3 ls s3://name-of-the-bucket”需要额外的权限。 See https://aws.amazon.com/premiumsupport/knowledge-center/s3-access-denied-listobjects-sync/ .请参阅https://aws.amazon.com/premiumsupport/knowledge-center/s3-access-denied-listobjects-sync/ 。

I wouldn't use s3fs, as it's not AWS supported, it's been reported to be slow (as you've already noticed), and other reasons .我不会使用 s3fs，因为它不受 AWS 支持，据报道它很慢（正如您已经注意到的那样），以及其他原因。

You might want to check the FSx section .您可能需要检查FSx 部分。 It can create an attach an FSx for Lustre filesystem.它可以为 Lustre 文件系统创建一个附加的 FSx。 It can import/export files to/from S3 natively.它可以本地向/从 S3 导入/导出文件。 We just need to set import_path and export_path on this section.我们只需要在这部分设置 import_path 和 export_path。