简体   繁体   中英

Error while running a map-reduce job which reads elasticsearch

I am getting the following error when I try to execute a map-reduce task which reads data from elasticsearch:-

java.lang.Exception: java.lang.RuntimeException: problem advancing post rec#0
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.lang.RuntimeException: problem advancing post rec#0
    at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1364)
    at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:220)
    at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:216)
    at org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:45)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.IOException: can't find class: org.elasticsearch.hadoop.mr.LinkedMapWritable because org.elasticsearch.hadoop.mr.LinkedMapWritable
    at org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:212)
    at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:169)
    at org.elasticsearch.hadoop.mr.LinkedMapWritable.readFields(LinkedMapWritable.java:148)
    at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
    at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
    at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1421)
    at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1361)
    ... 12 more
14/09/08 16:18:43 INFO mapreduce.Job: Job job_local1675221004_0001 failed with state FAILED due to: NA

My main Runner class is as follows:-

public class Es2 {

        static private final Path TMP_DIR = new Path(Es2.class.getSimpleName()
            + "_TMP_1");

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws IOException{

    JobConf conf = new JobConf();
    conf.set("es.resource", "conceptnet_data/concept");       
    conf.set("es.query", "?q=me*");                 
    conf.setInputFormat(EsInputFormat.class);       
    conf.setMapOutputKeyClass(Text.class);          
    conf.setMapOutputValueClass(LinkedMapWritable.class);
    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(LinkedMapWritable.class);
    conf.setOutputFormat(TextOutputFormat.class);
    conf.setMapperClass(mapper1.class);
    final Path outDir = new Path(TMP_DIR, "out");
    FileOutputFormat.setOutputPath(conf, outDir);
    JobClient.runJob(conf);
    }
}

The mapper class is as follows:-

public class mapper1 extends MapReduceBase implements 
        Mapper<Object,Object,Text,MapWritable>{

 @Override
 public void map(Object key, Object value, OutputCollector<Text,MapWritable> output,
                    Reporter reporter) throws IOException {
   Text docId = (Text) key;
   MapWritable doc = (LinkedMapWritable) value;      
   output.collect(docId,doc);
 }

}

Kindly guide me on this issue.

I had same issue and I solved it by adding elasticsearch-hadoop jars to hadoop's class path.

Try something like this:

export HADOOP_CLASSPATH=/home/tariq/java/library/elasticsearch-hadoop-mr-2.0.2.jar:/home/tariq/java/library/elasticsearch-hadoop-2.0.2.jar

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM