使用hadoop作业的输出作为另一个的输入

Question

I have having a problem using the output of one M/R job as the input of another. 使用一个M / R作业的输出作为另一个的输入时，我遇到了问题。 According this post , and many other online resources, a way to do this is to create a job1, and then a job2. 根据这篇文章以及许多其他在线资源，一种方法是先创建一个job1，然后创建一个job2。 However, when I do this, I am getting this error: 但是，在执行此操作时，出现此错误：

Error: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.LongWritable

The only time I use LongWritable as a class, is as the key to my mapper. 我唯一使用LongWritable作为类的时间是作为映射器的键。 I remember that this needs to stay like this, as this is the offset in the input file. 我记得这需要保持这样，因为这是输入文件中的偏移量。 When I change the signature, to be Text , like so: 当我更改签名时，将其更改为Text ，如下所示：

public class ErrorMapperCombiner extends Mapper<Text, Text, Text, IntWritable>

I get this error: 我收到此错误：

Error: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.Text

So how can I use the output of one M/R job as the input to another? 那么，如何将一个M / R作业的输出用作另一个的输入？

I am using this in my "runner" class to chain the too: 我在“ runner”类中也使用了它来链接：

job1.setOutputFormatClass(SequenceFileOutputFormat.class);

job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(IntWritable.class);

.....

job2.setInputFormatClass(SequenceFileInputFormat.class);

Answer 1

I was setting the value twice. 我两次设置了该值。 (eg) （例如）

job1.setOutputFormatClass(TextOutputFormat.class);
....
job1.setOutputFormatClass(SequenceFileOutputFormat.class);

Though the SequenceFileOutputFormat was after it, so I would think that value would be used. 尽管SequenceFileOutputFormat是SequenceFileOutputFormat其后的，所以我认为将使用该值。 But nevertheless, works now. 但尽管如此，现在可以使用。

使用hadoop作业的输出作为另一个的输入

问题描述

1 个解决方案

解决方案1
0 已采纳 2013-11-15 06:21:35

使用hadoop作业的输出作为另一个的输入

问题描述

1 个解决方案

解决方案1 0 已采纳 2013-11-15 06:21:35

解决方案1
0 已采纳 2013-11-15 06:21:35