简体   繁体   中英

Hadoop: Implement a nested for loop in MapReduce [Java]

I am trying to implement a statistical formula that requires comparing a datapoint with all other possible datapoints. For example my dataset is something like:

10.22
15.77
16.55
9.88

I need to go through this file like:

for (i=0;i< data.length();i++)
   for (j=0;j< data.length();j++)
     Sum +=(data[i] + data[j])

Basically when i get each line through my map function, i need to execute some instructions on the rest of the file in the reducer like in a nested for loop. Now i have tried using the distributedCache, some form of ChainMapper but to no avail. Any idea of how i can go about doing this would be really appreciated. Even an out of the box way will be helpful.

You need to override the run method implementation of the Reducer Class.

 public void run(Context context) throws IOException, InterruptedException {
  setup(context);
  while (context.nextKey()) {
     //This corresponds to the ones corresponding to i of first iterator
    Text currentKey = context.getCurrentKey();
    Iterator<VALUEIN> currentValue = context.getValues();
    if(context.nextKey()){
     //You can get the Next Values the ones corresponding to j of you second iterator
    }
}
cleanup(context);

}

or if you don't have reducer you can do the same in the Mapper as well by overriding the

public void run(Context context) throws IOException, InterruptedException {
setup(context);
while (context.nextKeyValue()) {
 /*context.nextKeyValue() if invoked again gives you the next key values which is same as the ones you are looking for in the second loop*/
}
cleanup(context);

}

Let me know if this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM