New to Hadoop...I have a series of HDFS directories with the naming convention filename.seq. Each directory contains an index, data and bloom file. These have binary content and appear to be SequenceFiles (SEQ starts the header). I want to know the structure/schema. Everything I read refers to reading an individual sequence file so I'm not sure how to read these or how they were produced. Thanks.
Update: I've tried recommended tools for streaming & outputting text on the files, none worked:
hadoop fs -text /path/to/hdfs-filename.seq/data | head
hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.1.2.jar \
-input /path/to/hdfs-filename.seq/data \
-output /tmp/outputfile \
-mapper "/bin/cat" \
-reducer "/bin/wc -l" \
-inputformat SequenceFileAsTextInputFormat
Error was:
ERROR streaming.StreamJob: Job not successful. Error: NA
The SEQ header confirms that hadoop sequence file. (One thing that I have never seem is the bloom file that you mentioned.)
The structure / schema of a typical Sequence file is:
For more details:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.