尝试将整个文件夹作为Mapreduce程序的输入时，我遇到“ Java堆空间错误”

Question

I'm facing "Java Heap space error",when I'm trying to run the mapreduce program by giving entire folder as input to the MR Job.When I'm giving a single file as input to MR job,I'm facing no error.The job has run successfully. 当我尝试通过将整个文件夹作为MR Job的输入来运行mapreduce程序时，遇到“ Java堆空间错误”。当我将单个文件作为MR Job的输入时，我面临着没有错误。作业已成功运行。

Changes I tried in hadoop-env.sh file:
=====================================
I had increased the memory size from 1024 to 2048MB
export HADOOP_CLIENT_OPTS="-Xmx2048m $HADOOP_CLIENT_OPTS"

Changes in mapred-site.xml:
===========================
<property>
  <name>mapred.child.java.opts</name>
  <value>-Xmx2048m</value>
</property>

By making changes in these files also,I'm still facing the "Java heap space error". 通过对这些文件进行更改，我仍然面临着“ Java堆空间错误”。

Can anyone please suggest me on this issue ... 任何人都可以在这个问题上建议我...

Answer 1

You can turn on the HPROF profiling for your job with something like this, 您可以使用以下方式为您的工作打开HPROF配置文件，

conf.setBoolean("mapred.task.profile", true); conf.set("mapred.task.profile.params", "-agentlib:hprof=cpu=samples," + "heap=sites,depth=6,force=n,thread=y,verbose=n,file=%s"); conf.set("mapred.task.profile.maps", "0-2"); conf.set("mapred.task.profile.reduces", "0-2");

This will help you to diagnose what exhausted the heap. 这将帮助您诊断耗尽堆的原因。 See more details in "Hadoop The Definitive Guide" page 178-181." 请参阅“ Hadoop权威指南”第178-181页中的更多详细信息。

尝试将整个文件夹作为Mapreduce程序的输入时，我遇到“ Java堆空间错误”

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-04-27 07:44:03

尝试将整个文件夹作为Mapreduce程序的输入时，我遇到“ Java堆空间错误”

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-04-27 07:44:03

解决方案1
0 已采纳 2015-04-27 07:44:03