简体   繁体   中英

Amazon Hadoop 2.4 + Avro 1.77 : Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

I'm trying to run the following code on EMR and it given the aforementioned exception. Does anyone know what might be going wrong? I'm using avro-tools-1.77 to compile my schemas.

After a bit of research, I'm beginning to feel that it might be an avro problem and can be fixed by compiling using Maven and editing dependencies or maybe changing amazon hadoop version to some previous version. But, I've never used Maven and changing hadoop version messes up a lot of my other code.

public class MapReduceIndexing extends Configured implements Tool{
static int number_of_documents;
static DynamoStorage ds = new DynamoStorage();  

public static class IndexMapper extends Mapper<AvroKey<DocumentSchema>, NullWritable, Text, IndexValue>{
    public void map(AvroKey<DocumentSchema> key, NullWritable value, Context context) throws IOException, InterruptedException {

        System.out.println("inside map start");

        //some mapper code e.g.
        for(String word : all_words.keySet()){
            context.write(new Text(word), iv);              
        }
        System.out.println("inside map end");
    }
}


public static class IndexReducer extends Reducer<Text, IndexValue, AvroKey<CharSequence>, AvroValue<Integer>> {

    @Override
    public void reduce(Text key, Iterable<IndexValue> iterable_values, Context context) throws IOException, InterruptedException {
        System.out.println("inside reduce start");
        //some reducer code
        System.out.println("inside reduce end");
    }
}


public int run(String[] args) throws Exception {        
    Configuration conf = new Configuration();       
    Job job = new Job(conf, "indexing");
    job.setJarByClass(MapReduceIndexing.class);
    job.setJobName("Making inverted index");

    FileInputFormat.setInputPaths(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.setInputFormatClass(AvroKeyInputFormat.class);
    job.setMapperClass(IndexMapper.class);
    AvroJob.setInputKeySchema(job, DocumentSchema.getClassSchema());
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IndexValue.class);

    job.setOutputFormatClass(AvroKeyValueOutputFormat.class);
    job.setReducerClass(IndexReducer.class);
    AvroJob.setOutputKeySchema(job, Schema.create(Schema.Type.STRING));
    AvroJob.setOutputValueSchema(job, Schema.create(Schema.Type.INT));

    return (job.waitForCompletion(true) ? 0 : 1);
}


public static void main(String[] args) throws Exception {
    //setting input and output directories

    AWSCredentials credentials = new BasicAWSCredentials("access key", "secret key");
    AmazonS3 s3 = new AmazonS3Client(credentials);      
    ObjectListing object_listing = s3.listObjects(new ListObjectsRequest().withBucketName(args[2]));
    number_of_documents = object_listing.getObjectSummaries().size();

    int res = ToolRunner.run(new MapReduceIndexing(), args);
    System.exit(res);
}}

Check if avro-tools is on your compile classpath. It includes a org.apache.hadoop.mapreduce.TaskAttemptContext that might be conflicting with the version in your jar and/or on the cluster. If you need to include avro-tools for some reason, you'll have to download a version that's been compiled against your version of Hadoop (Cloudera has this in their repository , but I'm uncertain where to get it for EMR), or compile avro-tools yourself.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM