Apache Mahout中的WrongValueClass

Question

I have written a mapreduce programm using mahout. 我已经使用mahout编写了mapreduce程序。 the map output value is ClusterWritable .when i run the code in eclipse, it is run with no error, but when i run rhe jar file in terminal, it shows the exception: 地图输出值为ClusterWritable 。当我在eclipse中运行代码时，它运行没有错误，但是当我在终端中运行rhe jar文件时，它显示异常：

java.io.IOException: wrong value class: org.apache.mahout.math.VectorWritable is not class org.apache.mahout.clustering.iterator.ClusterWritable
at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:988)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:74)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:498)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.mahout.clustering.canopy.CanopyMapper.cleanup(CanopyMapper.java:59)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

The output code in map is: map中的输出代码为：

context.write(new Text(), new ClusterWritable());

but i don't know why it says that the value type is VectorWritable . 但是我不知道为什么它说值类型是VectorWritable 。

Answer 1

Mapper being run, resulting in stacktrace above is Mahout's CanopyMapper, and not custom one you've written. 运行Mapper，导致上面的stacktrace是Mahout的CanopyMapper，而不是您编写的自定义代码。 CanopyMapper.cleanup method is outputting (key: Text, value: VectorWritable). CanopyMapper.cleanup方法正在输出（键：文本，值：VectorWritable）。 See CanopyMapper.java 参见CanopyMapper.java

See also CanopyDriver.java and its buildClustersMR method, where MR job is configured, mapper, reducer, and appropriate output key/value classes. 另请参见CanopyDriver.java及其buildClustersMR方法，其中配置了MR作业，映射器，化简器以及适当的输出键/值类。

You didn't state, so I'm guessing that you're using more than one MR job in a data flow pipeline. 您没有声明，所以我猜您在数据流管道中使用了多个MR作业。 Check that outputs of each job in pipeline are valid/expected input for next job in pipeline. 检查管道中每个作业的输出是否是管道中下一个作业的有效/预期输入。 Consider using cascading/scalding to define your data flow (see http://www.slideshare.net/melrief/scalding-programming-model-for-hadoop ) 考虑使用级联/缩放来定义数据流（请参阅http://www.slideshare.net/melrief/scalding-programming-model-for-hadoop ）

Consider using Mahout user mailing list to post Mahout related questions. 考虑使用Mahout用户邮件列表来发布Mahout相关问题。

Apache Mahout中的WrongValueClass

问题描述

1 个解决方案

解决方案1
0 2013-10-13 17:52:30

Apache Mahout中的WrongValueClass

问题描述

1 个解决方案

解决方案1 0 2013-10-13 17:52:30

解决方案1
0 2013-10-13 17:52:30