简体   繁体   中英

Hadoop MR hold array reference in reduce method

I would like to have an arrayList that holds reference to object inside the reduce function.

@Override
public void reduce( final Text pKey,
                    final Iterable<BSONWritable> pValues,
                    final Context pContext )
        throws IOException, InterruptedException{
    final ArrayList<BSONWritable> bsonObjects = new ArrayList<BSONWritable>();

    for ( final BSONWritable value : pValues ){
        bsonObjects.add(value);
        //do some calculations.
    }
   for ( final BSONWritable value : bsonObjects ){
       //do something else.
   }
   }

The problem is that the bsonObjects.size() returns the correct number of elements but all the elements of the list are equal to the last inserted element. eg if the

{id:1}

{id:2}

{id:3}

elements are to be inserted the bsonObjects will hold 3 items but all of them will be {id:3}. Is there a problem with this approach? any idea why this happens? I have tried to change the List to a Map but then only one element was added to the map. Also I have tried to change the declaration of the bsonObject to global but the same behavior happes.

This is documented behavior. The reason is that the pValues Iterator re-uses the BSONWritable instance and when it's value changes in the loop all references in bsonObjects ArrayList are updated as well. You're storing a reference when you call add() on bsonObjects. This approach allows Hadoop to save memory.

You should instantiate a new BSONWritable variable in that first loop that equals the variable value (deep copy). Then add the new variable into bsonObjects.

Try this:

for ( final BSONWritable value : pValues ){
    BSONWritable v = value; 
    bsonObjects.add(v);
    //do some calculations.
}
for ( final BSONWritable value : bsonObjects ){
   //do something else.
}

Then you will be able to iterate through bsonObjects in the second loop and retrieve each distinct value.

However, you should also be careful -- if you make a deep copy all the values for the key in this reducer will need to fit in memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM