简体   繁体   English

Hadoop服务器连接,用于将文件从HDFS复制到AWS S3

[英]Hadoop server connection for copying files from HDFS to AWS S3

Requirement is to copy hdfs files from Hadoop cluster(non-aws) to AWS S3 bucket with standalone java application scheduled with daily CRON. 要求是使用每日CRON安排的独立Java应用程序将hdfs文件从Hadoop集群(非aws)复制到AWS S3存储桶。 Would be using AmazonS3.copyObject() method for copying. 将使用AmazonS3.copyObject()方法进行复制。 How to specify the kerberized server connection details for the source Hadoop cluster so that S3client can access the files from source hdfs folder. 如何为源Hadoop集群指定使用Kerberos的服务器连接详细信息,以便S3client可以访问源hdfs文件夹中的文件。

The below command was used earlier but its not the secure way of transferring files. 下面的命令较早使用,但不是安全的文件传输方式。

hadoop distcp -Dfs.s3a.access.key=<<>> -Dfs.s3a.secret.key=<<>> hdfs://nameservice1/test/test1/folder s3a://<>/test/test1/folder hadoop distcp -Dfs.s3a.access.key = << >> -Dfs.s3a.secret.key = << >> hdfs:// nameservice1 / test / test1 / folder s3a:// <> / test / test1 /夹

S3 doesn't go near kerberos; S3不会靠近kerberos; your cronjob will have to use kinit off a keytab to authenticate for the HDFS access. 您的cronjob将必须使用密钥表上的kinit来对HDFS访问进行身份验证。

The most secure way to pass secrets to distcp is to keep them in a JCEKS file in the cluster FS, such as one in the home dir of the user running the job, with permissions only for reading by that person (max paranoia: set a password for encrypting and pass that in with the job). 将秘密传递给distcp的最安全方法是将其保存在群集FS的JCEKS文件中,例如运行该作业的用户的主目录中的一个文件,该权限仅允许该人读取(最大妄想症:设置一个加密密码,并将其与作业一起传递)。 See Protecting S3 Credentials with Credential Providers 请参阅使用凭证提供者保护S3凭证

One more trick to try: create session credentials using the CLI assume role command, and pass the temporary credentials to distcp for s3a to pick up . 可以尝试的另一种技巧:使用CLI 假定角色命令创建会话凭据,然后将临时凭据传递给distcp ,以使s3a接起来 That way, yes, the secrets are visible for .ps, but they aren't the longer lived secrets. 这样,是的,秘密对于.ps是可见的,但不是寿命更长的秘密。 You can also ask for a specific role there with restricted access compared to the user's full account (eg: r/w access to one bucket only) 与用户的完整帐户相比,您还可以要求访问权限受限的特定角色(例如:仅对一个存储桶进行读/写访问)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM