简体   繁体   中英

Is it possible to read the data that is being written to HDFS

I have a simple java program hat writes to hdfs continuously. My understanding is that once a particular a particular block is written in HDFS, it can be accessed by other clients but in my case I have not been able to do so. I am writing a file of size 39 Kb and each write starting after 100 ms. I check the status of the file in hue but it shows 0 Bytes till the writing operation is going on and after the writing is complete it shows the complete file.I want to be able to read the data written in the file in at least say 4 Kb blocks. I am using the default configurations. Is my assumption correct? If so, what I am doing wrong? I using a VM with CDH 4.4.

Hadoop Definative指南中的一致性模型说:“创建文件后,它按预期在文件系统名称空间中可见:但是,即使刷新了流,也不能保证写入文件的任何内容都是可见的。因此,文件出现长度为零:写入的数据量超过一个区块的价值后,新读者将可以看到第一个区块。”

hsync() or hflush() method of FSDataOutputStream should guarentee data is visible.

Using Hadoop File System you can read the file which is already written in HDFS.

Here are some URLs with code snippet which might be helpful to you

Read a file from HDFS in Hadoop classes in Java

Reading data from HDFS programatically using java

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM