简体   繁体   English

从HDFS下载大文件

[英]Download a large file from HDFS

I was given DataInputStream from a HDFS client for a large file (around 2GB) and I need to store it as a file on my host. HDFS客户端为我提供了一个大文件(大约2GB)的DataInputStream,我需要将其作为文件存储在主机上。

I was thinking about using apache common IOUtils and doing something like this... 我正在考虑使用apache通用IOUtils并执行类似的操作...

File temp = getTempFile(localPath);
DataInputStream dis = HDFSClient.open(filepath); // around 2GB file (zipped)
in = new BufferedInputStream(dis);
out = new FileOutputStream(temp);
IOUtils.copy(in, out);

I was looking for other solutions that can work better than this approach. 我一直在寻找其他比这种方法更有效的解决方案。 Major concern for this is to use buffering in both input and IOUtils.copy ... 为此,主要要考虑的是在输入和IOUtils.copy中都使用缓冲。

For files larger than 2GB is recommended to use IOUtils.copyLarge() (if we are speaking about the same IOUtils: org.apache.commons.io.IOUtils ) 对于大于2GB的文件,建议使用IOUtils.copyLarge() (如果我们谈论的是相同的IOUtils: org.apache.commons.io.IOUtils

The copy in IOUtils uses a default buffer size of 4Kb (although you can specify another buffer size as a parameter). IOUtils中的副本使用默认的缓冲区大小4Kb(尽管您可以将另一个缓冲区大小指定为参数)。

The difference between copy() and copyLarge() is the returning result. copy()copyLarge()之间的区别是返回结果。

For copy() , if the stream is bigger than 2GB you will succeed with the copy but the result is -1. 对于copy() ,如果流大于2GB,您将成功进行复制,但结果为-1。

For copyLarge() the result is exactly the amount of bytes you copied. 对于copyLarge() ,结果恰好是您复制的字节数。

See more in the documentation here: http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/IOUtils.html#copyLarge(java.io.InputStream,%20java.io.OutputStream) 在此处的文档中查看更多信息: http : //commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/IOUtils.html#copyLarge(java.io.InputStream,%20java.io。 OutputStream中)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM