Hadoop: Implement a nested for loop in MapReduce [Java]

Question

I am trying to implement a statistical formula that requires comparing a datapoint with all other possible datapoints. For example my dataset is something like:

I need to go through this file like:

for (i=0;i< data.length();i++)
   for (j=0;j< data.length();j++)
     Sum +=(data[i] + data[j])

Basically when i get each line through my map function, i need to execute some instructions on the rest of the file in the reducer like in a nested for loop. Now i have tried using the distributedCache, some form of ChainMapper but to no avail. Any idea of how i can go about doing this would be really appreciated. Even an out of the box way will be helpful.

Answer 1

You need to override the run method implementation of the Reducer Class.

 public void run(Context context) throws IOException, InterruptedException {
  setup(context);
  while (context.nextKey()) {
     //This corresponds to the ones corresponding to i of first iterator
    Text currentKey = context.getCurrentKey();
    Iterator<VALUEIN> currentValue = context.getValues();
    if(context.nextKey()){
     //You can get the Next Values the ones corresponding to j of you second iterator
    }
}
cleanup(context);

}

or if you don't have reducer you can do the same in the Mapper as well by overriding the

public void run(Context context) throws IOException, InterruptedException {
setup(context);
while (context.nextKeyValue()) {
 /*context.nextKeyValue() if invoked again gives you the next key values which is same as the ones you are looking for in the second loop*/
}
cleanup(context);

}

Let me know if this helps.

Hadoop: Implement a nested for loop in MapReduce [Java]

Question

1 answers

solution1
0 2014-04-30 08:17:23

Hadoop: Implement a nested for loop in MapReduce [Java]

Question

1 answers

solution1 0 2014-04-30 08:17:23

solution1
0 2014-04-30 08:17:23