简体   繁体   中英

Java Heap Space in MapReduce

I am running an MapReduce job in a machine with 32GB of RAM, but I get a JAVA heap space error. I have set yarn.nodemanager.resource.memory-mb to 32GB hoping that I would have enough memory to run the tasks, but I guess not. How should I configure MapReduce v2 to not have this problem?

EDIT :

16/08/30 19:00:49 INFO mapreduce.Job: Task Id : attempt_1472579604725_0003_m_000000_0, Status : FAILED
Error: Java heap space
16/08/30 19:00:55 INFO mapreduce.Job: Task Id : attempt_1472579604725_0003_m_000000_1, Status : FAILED
Error: Java heap space
16/08/30 19:01:00 INFO mapreduce.Job: Task Id : attempt_1472579604725_0003_m_000000_2, Status : FAILED
Error: Java heap space

[2] mapred-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->
<configuration>
 <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
 <property> <name>mapreduce.jobhistory.done-dir</name> <value>/root/Programs/hadoop/logs/history/done</value> </property>
 <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/root/Programs/hadoop/logs/history/intermediate-done-dir</value> </property>
 <property> <name>mapreduce.job.reduces</name> <value>2</value> </property>

 <!-- property> <name>yarn.nodemanager.resource.memory-mb</name> <value>10240</value> </property>
 <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>1024</value> </property -->

<!-- property><name>mapreduce.task.files.preserve.failedtasks</name><value>true</value></property>
<property><name>mapreduce.task.files.preserve.filepattern</name><value>*</value></property -->

[3] yarn-site.xml

<configuration>
 <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property>
 <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>  
 <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>
 <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>s8:8025</value> </property>
 <property> <name>yarn.resourcemanager.scheduler.address</name> <value>s8:8030</value> </property>
 <property> <name>yarn.resourcemanager.address</name> <value>s8:8032</value> </property>
 <property> <name>yarn.log.server.url</name> <value>http://s8:19888/jobhistory/logs/</value> </property> 

 <!-- job history -->
 <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property>
 <property> <name>yarn.nodemanager.log.retain-seconds</name> <value>900000</value> </property>
 <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/app-logs</value> </property>

 <!-- proxy -->
 <property><name>yarn.web-proxy.address</name><value>s8:9046</value></property>

 <!-- to check the classpath in yarn, do yarn classpath -->
 <!-- compress output data -->
 <property><name>mapreduce.output.fileoutputformat.compress</name><value>false</value></property>
 <property><name>mapred.output.fileoutputformat.compress.codec</name><value>org.apache.hadoop.io.compress.BZip2Codec</value></property>

 <!-- Node configuration -->
   <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>33554432</value> </property>
</configuration>

Parameter yarn.nodemanager.resource.memory-mb tells how many resources are available for Yarn (repeated from comments)

If you want to your mapreduce program to use those resources you should set following parameters.

mapreduce.map.memory.mb

mapreduce.map.java.opts

mapreduce.reduce.memory.mb

mapreduce.reduce.java.opts

Just make sure you set java.opts to 10-20% less than memory.mb.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM