I am a beginner with Hadoop and Java, and I am writing Map, Reduce functions to cluster a set of latitudes and longitudes together into groups based on proximity, and set a magnitude (Number of lat,long pairs in a cluster) and a representative lat,long pair (As of now, it's the first lat,long pair encountered.)
Here's my code:
package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import util.hashing.*;
public class LatLong {
public static class Map extends Mapper<Object, Text, Text, Text> {
//private final static IntWritable one = new IntWritable(1);
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] longLatArray = line.split(",");
double longi = Double.parseDouble(longLatArray[0]);
double lat = Double.parseDouble(longLatArray[1]);
//List<Double> origLatLong = new ArrayList<Double>(2);
//origLatLong.add(lat);
//origLatLong.add(longi);
Geohash inst = Geohash.getInstance();
//encode is the library's encoding function
String hash = inst.encode(lat,longi);
//Using the first 5 characters just for testing purposes
//Need to find the right one later
int accuracy = 4;
//hash of the thing is shortened to whatever I figure out
//to be the right size of each tile
Text shortenedHash = new Text(hash.substring(0,accuracy));
Text origHash = new Text(hash);
context.write(shortenedHash, origHash);
}
}
public static class Reduce extends Reducer<Text, Text, Text, Text> {
private IntWritable totalTileElementCount = new IntWritable();
private Text latlongimag = new Text();
private Text dataSeparator = new Text();
@Override
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
int elementCount = 0;
boolean first = true;
Iterator<Text> it= values.iterator();
String lat = new String();
String longi = new String();
Geohash inst = Geohash.getInstance();
while (it.hasNext()) {
elementCount = elementCount+1;
if(first)
{
lat = Double.toString((inst.decode(it.toString()))[0]);
longi = Double.toString((inst.decode(it.toString()))[1]);
first = false;
}
@SuppressWarnings("unused")
String blah = it.next().toString();
}
totalTileElementCount.set(elementCount);
//Geohash inst = Geohash.getInstance();
String mag = totalTileElementCount.toString();
latlongimag.set(lat+","+ longi +","+mag+",");
dataSeparator.set("");
context.write(latlongimag, dataSeparator );
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setJarByClass(LatLong.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
I'm getting a NPE. I don't know how I can test this, and I am not able to find the error in my code.
Hadoop Error:
java.lang.NullPointerException
at util.hashing.Geohash.decode(Geohash.java:41)
at org.myorg.LatLong$Reduce.reduce(LatLong.java:67)
at org.myorg.LatLong$Reduce.reduce(LatLong.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:663)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:426)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
The decode function from the Geohash library returns an array of doubles. Any pointers would be greatly appreciated! Thanks for your time!
EDIT1 (after testing):
I have realized that the problem was with the fact that there needs to be an it.next().toString() in the reduce function and not just an it.toString(), but when I made this change and tested, I got this error, and I don't know why it should be coming when I am checking hasnext() in the while loop condition.
java.util.NoSuchElementException: iterate past last value
at org.apache.hadoop.mapreduce.ReduceContext$ValueIterator.next(ReduceContext.java:159)
at org.myorg.LatLong$Reduce.reduce(LatLong.java:69)
at org.myorg.LatLong$Reduce.reduce(LatLong.java:1)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:663)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:426)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
EDIT2 (further testing): SOLUTION
I'm calling it.next() more than once, and being an iterator, that just causes it to go ahead, twice, and in the last iteration, it checks the condition and enters, but I'm then calling it.next() twice, which causes the problem as there is only one next element (The last one.)
You still call toString
on it
, instead it.next()
, so you should change
lat = Double.toString((inst.decode(it.toString()))[0]);
longi = Double.toString((inst.decode(it.toString()))[1]);
into
String cords = it.next().toString();
lat = Double.toString((inst.decode(cords))[0]);
longi = Double.toString((inst.decode(cords))[1]);
Don't make it inst.decode(it.next().toString())
because it would call it.next()
two times in one while
iteration.
After that don't call String blah = it.next().toString();
because you will get java.util.NoSuchElementException: iterate past last value
, same reason as above.
And when you remove String blah = it.next().toString();
remember that in case of first = false
you will never enter if(first)
and never call String cords = it.next().toString();
so it.hasNext()
will always return true
and you will never leave the while
loop, so add appropriate conditional statements.
This means either your "it" is null or else you get null after decode. Put null checks for them.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.