如何读取Hadoop Sequentil文件作为Hadoop作业的输入？

Question

I have a Sequential file which has the key-value pair of type "org.apache.hadoop.typedbytes.TypedBytesWritable" , I have to provide this file as the input to the Hadoop job and have to process it in map only. 我有一个顺序文件，其键值对类型为“ org.apache.hadoop.typedbytes.TypedBytesWritable” ，我必须提供此文件作为Hadoop作业的输入，并且只能在map中进行处理。 I mean i dont have to do anything which will need reduce. 我的意思是我不需要做任何需要减少的事情。

1) How will i specify the FileInputFormat as SequentialFile ? 1）我如何将FileInputFormat指定为SequentialFile？

2) What will be the signature of map function. 2）地图功能的签名是什么。

3) How will i get output from map instead of Reduce? 3）我如何从地图而不是减少输出？

Answer 1

1) How will i specify the FileInputFormat as SequentialFile ? 1）我如何将FileInputFormat指定为SequentialFile？

Set the SequenceFileAsBinaryInputFormat as the input format. 将SequenceFileAsBinaryInputFormat设置为输入格式。 Here is the code for the SequenceFileAsBinaryInputFormat class. 这是SequenceFileAsBinaryInputFormat类的代码。

Here is the code 这是代码

JobConf conf = new JobConf(getConf(), getClass());
conf.setInputFormat(SequenceFileAsBinaryInputFormat.class);

2) What will be the signature of map function. 2）地图功能的签名是什么。

The map would be invoked with a BytesWritable as key and value types. 该映射将使用BytesWritable作为键和值类型来调用。

3) How will i get output from map instead of Reduce? 3）我如何从地图而不是减少输出？

Set the mapred.reduce.tasks property to 0. The output of the map will be the final output of the job. 将mapred.reduce.tasks属性设置为0.映射的输出将是作业的最终输出。

Also, take a look at the SequenceFileAsTextInputFormat . 另外，看看SequenceFileAsTextInputFormat 。 The map would be invoked with Text as key and value types. 将使用Text作为键和值类型调用映射。

如何读取Hadoop Sequentil文件作为Hadoop作业的输入？

问题描述

1 个解决方案

解决方案1
3 已采纳 2012-01-11 14:26:10

如何读取Hadoop Sequentil文件作为Hadoop作业的输入？

问题描述

1 个解决方案

解决方案1 3 已采纳 2012-01-11 14:26:10

解决方案1
3 已采纳 2012-01-11 14:26:10