Null Pointer Exception - Hadoop Mapreduce job

Question

I am a beginner with Hadoop and Java, and I am writing Map, Reduce functions to cluster a set of latitudes and longitudes together into groups based on proximity, and set a magnitude (Number of lat,long pairs in a cluster) and a representative lat,long pair (As of now, it's the first lat,long pair encountered.)

Here's my code:

package org.myorg;

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import util.hashing.*;



public class LatLong {


 public static class Map extends Mapper<Object, Text, Text, Text> {
    //private final static IntWritable one = new IntWritable(1);


    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        String[] longLatArray = line.split(",");
        double longi = Double.parseDouble(longLatArray[0]);
        double lat = Double.parseDouble(longLatArray[1]);
        //List<Double> origLatLong = new ArrayList<Double>(2);
        //origLatLong.add(lat);
        //origLatLong.add(longi);
        Geohash inst = Geohash.getInstance();
        //encode is the library's encoding function
        String hash = inst.encode(lat,longi);
        //Using the first 5 characters just for testing purposes
        //Need to find the right one later
        int accuracy = 4;
        //hash of the thing is shortened to whatever I figure out
        //to be the right size of each tile
        Text shortenedHash = new Text(hash.substring(0,accuracy));
        Text origHash = new Text(hash);
        context.write(shortenedHash, origHash);
    }
 } 

 public static class Reduce extends Reducer<Text, Text, Text, Text> {

     private IntWritable totalTileElementCount = new IntWritable();
     private Text latlongimag = new Text();
     private Text dataSeparator = new Text();

     @Override
     public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
      int elementCount = 0;
      boolean first = true;
      Iterator<Text> it= values.iterator();
      String lat = new String();
      String longi = new String();
      Geohash inst = Geohash.getInstance();

      while (it.hasNext()) {
       elementCount = elementCount+1;
       if(first)
       {
           lat = Double.toString((inst.decode(it.toString()))[0]);
           longi = Double.toString((inst.decode(it.toString()))[1]);
           first = false;

       }
       @SuppressWarnings("unused")
       String blah = it.next().toString();


      }
      totalTileElementCount.set(elementCount);
      //Geohash inst = Geohash.getInstance();

      String mag = totalTileElementCount.toString();

      latlongimag.set(lat+","+ longi +","+mag+",");
      dataSeparator.set("");
      context.write(latlongimag, dataSeparator );
     }
 }

 public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = new Job(conf, "wordcount");
    job.setJarByClass(LatLong.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    job.setMapperClass(Map.class);
    job.setReducerClass(Reduce.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.waitForCompletion(true);
 }

}

I'm getting a NPE. I don't know how I can test this, and I am not able to find the error in my code.

Hadoop Error:

    java.lang.NullPointerException
    at util.hashing.Geohash.decode(Geohash.java:41)
    at org.myorg.LatLong$Reduce.reduce(LatLong.java:67)
    at org.myorg.LatLong$Reduce.reduce(LatLong.java:1)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:663)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:426)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

The decode function from the Geohash library returns an array of doubles. Any pointers would be greatly appreciated! Thanks for your time!

EDIT1 (after testing):

I have realized that the problem was with the fact that there needs to be an it.next().toString() in the reduce function and not just an it.toString(), but when I made this change and tested, I got this error, and I don't know why it should be coming when I am checking hasnext() in the while loop condition.

    java.util.NoSuchElementException: iterate past last value
    at    org.apache.hadoop.mapreduce.ReduceContext$ValueIterator.next(ReduceContext.java:159)
    at org.myorg.LatLong$Reduce.reduce(LatLong.java:69)
    at org.myorg.LatLong$Reduce.reduce(LatLong.java:1)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:663)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:426)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

EDIT2 (further testing): SOLUTION

I'm calling it.next() more than once, and being an iterator, that just causes it to go ahead, twice, and in the last iteration, it checks the condition and enters, but I'm then calling it.next() twice, which causes the problem as there is only one next element (The last one.)

Answer 1

You still call toString on it , instead it.next() , so you should change

lat = Double.toString((inst.decode(it.toString()))[0]);
longi = Double.toString((inst.decode(it.toString()))[1]);

into

String cords = it.next().toString();
lat = Double.toString((inst.decode(cords))[0]);
longi = Double.toString((inst.decode(cords))[1]);

Don't make it inst.decode(it.next().toString()) because it would call it.next() two times in one while iteration.

After that don't call String blah = it.next().toString(); because you will get java.util.NoSuchElementException: iterate past last value , same reason as above.

And when you remove String blah = it.next().toString(); remember that in case of first = false you will never enter if(first) and never call String cords = it.next().toString(); so it.hasNext() will always return true and you will never leave the while loop, so add appropriate conditional statements.

Answer 2

This means either your "it" is null or else you get null after decode. Put null checks for them.

Null Pointer Exception - Hadoop Mapreduce job

Question

2 answers

solution1
1 ACCPTED 2014-06-06 19:28:29

solution2
0 2014-06-06 17:59:23

Null Pointer Exception - Hadoop Mapreduce job

Question

2 answers

solution1 1 ACCPTED 2014-06-06 19:28:29

solution2 0 2014-06-06 17:59:23

solution1
1 ACCPTED 2014-06-06 19:28:29

solution2
0 2014-06-06 17:59:23