简体   繁体   中英

distcp from s3 to hadoop - file not found

I am getting the below error about a file not found. Well...the file exists. I am a newbie with distcp. I am using cloudera FYI.

 https://s3.amazonaws.com/test-development/test/201305031003_0_ubuntu.gz


ubuntu@ubuntu:~$ hadoop distcp -i 201305031003_0_ubuntu.gz s3://id:key@test-development/test/201305031003_0_ubuntu.gz
13/05/04 14:54:29 INFO tools.DistCp: srcPaths=[201305031003_0_ubuntu.gz]
13/05/04 14:54:29 INFO tools.DistCp: destPath=s3://id:key@test-development/test/201305031003_0_ubuntu.gz
With failures, global counters are inaccurate; consider running with -i
Copy failed: org.apache.hadoop.mapred.InvalidInputException: Input source 201305031003_0_ubuntu.gz does not exist.
    at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:641)
    at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)

第一个参数是源,所以它应该是S3的路径,路径应该是s3n://而不是s3://(本机s3),除非你使用s3://将数据写入S3(块文件系统)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM