繁体   English   中英

Hadoop-将数据集从外部直接复制到HDFS

[英]Hadoop - copy dataset from external to HDFS directly

我正在尝试使用distcp将〜500 MB压缩文件复制到HDFS,但出现连接超时错误:

hadoop distcp  hftp://s3.amazonaws.com/path/to/file.gz hdfs://namenode/some/hdfs/dir

继承人完整的错误:

java.net.SocketTimeoutException:连接在java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)的java.net.PlainSocketImpl.socketConnect(本机方法)处超时,在java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) )at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)at java.net.Socket.connect(Socket.java:589)at sun.net .NetworkClient.doConnect(NetworkClient.java:175)位于sun.net.www.http.HttpClient.openServer(HttpClient.java:432)在sun.net.www.http.HttpClient.openServer(HttpClient.java:527) sun.net.www.http.HttpClient.New(HttpClient.java:308)上的sun.net.www.http.HttpClient。(HttpClient.java:211)sun.net.www.http.HttpClient.New(HttpClient) .java:326),位于sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202),位于sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)。 ñ 在org.apache.hadoop.hdfs.web上的sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)等上的www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)。 org.apache.hadoop.hdfs.web.HftpFileSystem $ LsParser.fetchList(HftpFileSystem.java:461)处的HftpFileSystem.openConnection(HftpFileSystem.java:328)在org.apache.hadoop.hdfs.web.HftpFileSystem $ LsParser.getFileStatus(在org.apache.hadoop.hdfs.web.HftpFileSystem.java:476)在org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)处org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)处org.apache.hadoop.fs.Globber.glob(Globber.java:151)上的.hadoop.fs.Globber.doGlob(Globber.java:272)org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java) :1715),位于org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77),位于org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86),位于org.apache.hadoop.tools。组织中的DistCp.createInputFileListing(DistCp.java:429) org.apache.hadoop.tools.DistCp.execute(DistCp.java:181)上的.apache.hadoop.tools.DistCp.prepareFileListing(DistCp.java:91)在org.apache.hadoop.tools.DistCp.run(DistCp .java:143),位于org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70),位于org.apache.hadoop.tools.DistCp.main(DistCp.java:493)

将如此大的文件复制到HDFS的正确方法是什么? 我正在使用CDH 5.14。

谢谢!

请使用这个。

hadoop distcp s3a:// hwdev-examples-ireland / datasets / tmp / datasets2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM