简体   繁体   中英

Reducer's Heap out of memory

So I have a few Pig scripts that keep dying in there reduce phase of the job with the errors that the Java heap keeps running out of space. To this date my only solution has been to increase Reducer counts, but that doesn't seem to be getting me anywhere reliable. Now part of this may be just the massive growth in data we are getting, but can't be sure.

I've thought about changing the spill threshold setting, can't recall the setting, but not sure if they would help any or just slow it down. What other things can I look at doing to solve this issue?

On a side note when this starts happening on occasion I also get errors about bash failing to get memory for what I assume is the spill operation. Would this be the Hadoop node running out of memory? If so would just turning down the heap size on these boxes be the solution?

Edit 1
1) Pig 0.8.1
2) The only UDF is an eval udf that just looks at single rows with no bags or maps.
3) I haven't noticed there being any hotspots with bad key distrobution. I have been using the prime number scale to reduce this issue as well.

Edit 2
Here is the error in question:
2012-01-04 09:58:11,179 FATAL org.apache.hadoop.mapred.TaskRunner: attempt_201112070707_75699_r_000054_1 : Map output copy failure : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)

Here is the bash error I keep getting:
java.io.IOException: Task: attempt_201112070707_75699_r_000054_0 - The reduce copier failed at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.io.IOException: Cannot run program "bash": java.io.IOException: error=12, Cannot allocate memory at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) at org.apache.hadoop.util.Shell.runCommand(Shell.java:149) at org.apache.hadoop.util.Shell.run(Shell.java:134) at org.apache.hadoop.fs.DF.getAvailable(DF.java:73) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) at org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:160) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2537) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2501)

Obviously you are running out of memory somewhere. Increasing the number of reducers is actually quite reasonable. Take a look at the stats on the JobTracker Web GUI and see how many bytes are going out of the mapper. Divide that by the number of reduce tasks, and that is a pretty rough estimate of what each reducer is getting. Unfortunately, this only works in the long run if your keys are evenly distributed.

In some cases, JOIN (especially the replicated kind) will cause this type of issue. This happens when you have a "hot spot" of a particular key. For example, say you are doing some sort of join and one of those keys shows up 50% of the time. Whatever reducer gets lucky to handle that key is going to get clobbered. You may want to investigate which keys are causing hot spots and handle them accordingly. In my data, usually these hot spots are useless anyways. To find out what's hot, just do a GROUP BY and COUNT and figure out what's showing up a lot. Then, if it's not useful, just FILTER it out.

Another source of this problem is a Java UDF that is aggregating way too much data. For example, if you have a UDF that goes through a data bag and collects the records into some sort of list data structure, you may be blowing your memory with a hot spot value.

I found that the newer versions of Pig (.8 and .9 particularly) have far fewer memory issues. I had quite a few instances of running out of heap in .7. These versions have much better spill to disk detection so that if its about to blow the heap, it is smart enough to spill to disk.


In order for me to be more helpful, you could post your Pig script and also mention what version of Pig you are using.

I'm not an experienced user or anything, but I did run into a similar problem when runing pig jobs on a VM.

My particular problem, was that the VM had no swap space configured, it would eventually run out of memory. I guess you're trying this in a proper linux configuration, but it would't hurt to do a: free -m and see what you get in result, maybe the problem is due to you having too little swap memory configured.

Just a thought, let me know if it helps. Good luck with your problem!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM