简体   繁体   English

AWS S3数据湖跨账户使用

[英]AWS S3 data lake cross account usage

we have the following scenario: AWS Account A (application) writes data from an application to an S3 bucket owned by account B (data lake). 我们有以下情形:AWS账户A(应用程序)将应用程序中的数据写入到账户B(数据湖)拥有的S3存储桶中。 The analysts in account C (reporting) want to proccess the data and build reports and dashboards on top of it. 帐户C(报告)中的分析人员希望处理数据并在其上构建报告和仪表板。

Account A can write data to the data lake with --acl bucket-owner-full-control to allow Account B the access. 帐户A可以使用--acl bucket-owner-full-control将数据写入数据湖,以允许帐户B进行访问。 But Account C still cannot see and process the data. 但是帐户C仍然无法查看和处理数据。

One (in our eyes bad) solution is to copy the data to the same location (overwrite) as account B, effectively taking ownership for the data in the process and eliminating the issue. 一种(在我们看来是不好的)解决方案是将数据复制到帐户B的相同位置(覆盖),从而在过程中有效地获取数据所有权并消除问题。 We don't want it, because ... ugly 我们不想要它,因为...丑陋

We tried assuming roles in the different accounts, but it does not work for all our infrastructure. 我们尝试在不同的帐户中扮演角色,但是它不适用于我们的所有基础架构。 Eg S3 access via CLI or console is OK, but using it from EMR in account C does not. 例如,可以通过CLI或控制台进行S3访问,但是不能通过EMR在帐户C中使用它。 Also we have on-premise infrastructure (local taskrunners), where this mechanism is not an option. 此外,我们还有本地基础结构(本地任务运行器),该机制不是一种选择。

Maintaining IAM roles for all accounts and users is too much effort. 为所有帐户和用户维护IAM角色是很费力的。 We aim for an automatic solution, not one that we have to take action every time a new user or account is added. 我们的目标是提供一种自动解决方案,而不是每次添加新用户或帐户时都必须采取措施的解决方案。

Do you have any suggestions? 你有什么建议吗?

You can do via the following documentation, 您可以通过以下文档进行操作,

https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_enable-console-saml.html https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_enable-console-saml.html

Steps: 脚步:

  1. Create SAML provider 创建SAML提供程序
  2. Create Role for the SAML provider, example below 为SAML提供者创建角色,下面的示例
  3. Assign the users role based on saml conditions 根据saml条件分配用户角色

Eg, You can create S3 Readers, S3 Writers and assign permissions based on that. 例如,您可以创建S3读取器,S3写入器并基于此分配权限。

Example Assume Role with SAML: 示例假定具有SAML的角色:

{
      "Version": "2012-10-17",
      "Statement": [{
        "Effect": "Allow",
        "Principal": {"Federated": "arn:aws:iam::ACCOUNT-ID-WITHOUT-HYPHENS:saml-provider/ExampleOrgSSOProvider"},
        "Action": "sts:AssumeRoleWithSAML",
        "Condition": {"StringEquals": {
          "saml:edupersonorgdn": "ExampleOrg",
          "saml:aud": "https://signin.aws.amazon.com/saml"
        }}
      }]
}

Hope it helps. 希望能帮助到你。

One nice and clean way is to use a bucket policy granting read access to the external account (account C) by supplying the account ARN as the principal. 一种不错的方法是使用存储桶策略,通过提供帐户ARN作为主体来授予对外部帐户(帐户C)的读取权限。

{
   "Version": "2012-10-17",
   "Statement": [
      {
         "Sid": "Grant read access to reporting account",
         "Effect": "Allow",
         "Principal": {
            "AWS": "arn:aws:iam::insertReportingAccountIdHere:root"
         },
         "Action": [
            "s3:GetBucketLocation",
            "s3:ListBucket",
            "s3:GetObject",
            "s3:GetObjectAcl"
         ],
         "Resource": [
            "arn:aws:s3:::yourdatalakebucket",
            "arn:aws:s3:::yourdatalakebucket/*"
         ]
      }
   ]
}

This lets the reporting account manage the (ListBucket, gGtObject) permissions on the bucket for its own users, meaning you can now create an IAM policy on Account C with the permission to fetch data from the specified data lake bucket: 这使报告帐户可以为其自己的用户管理存储区上的(ListBucket,gGtObject)权限,这意味着您现在可以在帐户C上创建IAM策略,并具有从指定数据湖存储区中提取数据的权限:

{
   "Version": "2012-10-17",
   "Statement": [
      {
         "Sid": "Allow reading files from the data lake",
         "Effect": "Allow",
         "Action": [
            "s3:GetBucketLocation",
            "s3:ListBucket",
            "s3:GetObject",
            "s3:GetObjectAcl"
         ],
         "Resource": [
            "arn:aws:s3:::yourdatalakebucket",
            "arn:aws:s3:::yourdatalakebucket/*"
         ]
      }
   ]
}

This policy can then be attached to any Account C IAM role or user group you want. 然后可以将此策略附加到所需的任何Account C IAM角色或用户组。 For example, you could attach it to your standard Developer or Analyst roles to give access to large groups of users, or you could attach it to a service role to give a particular service access to the bucket. 例如,您可以将其附加到标准的Developer或Analyst角色上,以授予对大型用户的访问权限,也可以将其附加到服务角色,以对存储桶授予特定的服务访问权限。

There is a guide on the Amazon S3 documentation site on how to do this. Amazon S3文档网站上有一份有关如何执行此操作的指南。

In our case, we solved it using roles in the DataLake account (B), both for write (WriterRole) and read (ReaderRole) access. 在我们的案例中,我们使用DataLake帐户(B)中的角色(用于写入(WriterRole)和读取(ReaderRole)访问)解决了该问题。 When writing to the DataLake from Account A, your writer assumes the "WriterRole" in Account B, that has the required permission. 从帐户A写入DataLake时,您的编写者会假设帐户B中的“ WriterRole”具有所需的权限。 When reading from Account C, you assume the "ReaderRole". 从帐户C进行读取时,您将假定为“ ReaderRole”。 The issues with EMR reading, we solved with EMRFS using IAM roles for reading ( https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-emrfs-iam-roles.html ) 关于EMR阅读的问题,我们使用IAM角色通过EMRFS解决了阅读问题( https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-emrfs-iam-roles.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM