org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space

Question

I have a 90MB snappy compressed file that I am attempting to use as input to Hadoop 2.2.0 on AMI 3.0.4 in AWS EMR.

Immediately upon attempting to read the file my record reader gets the following exception:

2014-05-06 14:25:34,210 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:123)
at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:98)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:365)
...

I'm running on an m1.xlarge in AWS using the default memory and io.sort.mb. If we decompress the file and use that as input instead everything goes fine. Trouble is we have a very large number of compressed files and don't want to go around decompressing everything.

I'm not sure if we're missing a configuration setting or a wiring in our code of some sort. Not sure how to proceed.

Answer 1

As per the log you have provided , it seems size of decompressed block is more than your available heap size.

I don't know about m1.large instance specifications on EMR, however here are some of the things you can try to ward off this error.

Usually error running child means , the . 的。

Options to try :

1) Increase mapred.java.child.opts size. It is the default size that child gets as its separate JVM process. By default, its 200mb , which is small for any reasonable data analysis. Change the parameters -XmxNu ( max heapsize of N in u units) and -XmsNu (initial heap size of N in units of u ). Try for 1Gb ie -Xmx1g and see the effect and if it succeeds then go smaller

2) set up mapred.child.ulimit to 1.5 or 2 times the size of max heap size as set previously. It sets the amount of virtual memory for for a process.

3) reduce mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum to set max no of parallel mappers and reducers running at a time.

4) io.sort.mb - which you have already tried. try it to 0.25*mapred.child.java.opts < io.sort.mb < 0.5*mapred.child.java.opts .

And at last, its a trial and error method, so try and see which one sticks.

org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space

Question

1 answers

solution1
2 2014-05-06 17:03:18

org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space

Question

1 answers

solution1 2 2014-05-06 17:03:18

solution1
2 2014-05-06 17:03:18