简体   繁体   English

Hadoop内存不足错误

[英]out of Memory Error in Hadoop

I tried installing Hadoop following this http://hadoop.apache.org/common/docs/stable/single_node_setup.html document.我尝试按照此http://hadoop.apache.org/common/docs/stable/single_node_setup.html文档安装 Hadoop。 When I tried executing this当我尝试执行此操作时

bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' 

I am getting the following Exception我收到以下异常

java.lang.OutOfMemoryError: Java heap space

Please suggest a solution so that i can try out the example.请提出一个解决方案,以便我可以尝试该示例。 The entire Exception is listed below.下面列出了整个异常。 I am new to Hadoop I might have done something dumb .我是 Hadoop 的新手,我可能做了一些愚蠢的事情。 Any suggestion will be highly appreciated.任何建议将不胜感激。

anuj@anuj-VPCEA13EN:~/hadoop$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
11/12/11 17:38:22 INFO util.NativeCodeLoader: Loaded the native-hadoop library
11/12/11 17:38:22 INFO mapred.FileInputFormat: Total input paths to process : 7
11/12/11 17:38:22 INFO mapred.JobClient: Running job: job_local_0001
11/12/11 17:38:22 INFO util.ProcessTree: setsid exited with exit code 0
11/12/11 17:38:22 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@e49dcd
11/12/11 17:38:22 INFO mapred.MapTask: numReduceTasks: 1
11/12/11 17:38:22 INFO mapred.MapTask: io.sort.mb = 100
11/12/11 17:38:22 WARN mapred.LocalJobRunner: job_local_0001
java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
11/12/11 17:38:23 INFO mapred.JobClient:  map 0% reduce 0%
11/12/11 17:38:23 INFO mapred.JobClient: Job complete: job_local_0001
11/12/11 17:38:23 INFO mapred.JobClient: Counters: 0
11/12/11 17:38:23 INFO mapred.JobClient: Job Failed: NA
java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1257)
    at org.apache.hadoop.examples.Grep.run(Grep.java:69)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.examples.Grep.main(Grep.java:93)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

For anyone using RPM or DEB packages, the documentation and common advice is misleading.对于使用 RPM 或 DEB 包的任何人,文档和常见建议具有误导性。 These packages install hadoop configuration files into /etc/hadoop.这些软件包将 hadoop 配置文件安装到 /etc/hadoop 中。 These will take priority over other settings.这些将优先于其他设置。

The /etc/hadoop/hadoop-env.sh sets the maximum java heap memory for Hadoop, by Default it is: /etc/hadoop/hadoop-env.sh 设置了 Hadoop 的最大 java 堆内存,默认情况下它是:

export HADOOP_CLIENT_OPTS="-Xmx128m $HADOOP_CLIENT_OPTS"

This Xmx setting is too low, simply change it to this and rerun这个Xmx设置太低了,直接改成这个再重新运行

export HADOOP_CLIENT_OPTS="-Xmx2048m $HADOOP_CLIENT_OPTS"

You can assign more memory by editing the conf/mapred-site.xml file and adding the property:您可以通过编辑 conf/mapred-site.xml 文件并添加属性来分配更多内存:

  <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1024m</value>
  </property>

This will start the hadoop JVMs with more heap space.这将启动具有更多堆空间的 hadoop JVM。

Another possibility is editing hadoop-env.sh , which contains export HADOOP_CLIENT_OPTS="-Xmx128m $HADOOP_CLIENT_OPTS" .另一种可能性是编辑hadoop-env.sh ,其中包含export HADOOP_CLIENT_OPTS="-Xmx128m $HADOOP_CLIENT_OPTS" Changing 128m to 1024m helped in my case (Hadoop 1.0.0.1 on Debian).将 128m 更改为 1024m 对我有帮助(Debian 上的 Hadoop 1.0.0.1)。

After trying so many combinations, finally I concluded the same error on my environment (Ubuntu 12.04, Hadoop 1.0.4) is due to two issues.在尝试了这么多组合之后,我终于得出结论,我的环境(Ubuntu 12.04、Hadoop 1.0.4)上的相同错误是由于两个问题造成的。

  1. Same as Zach Gamer mentioned above.与上面提到的 Zach Gamer 相同。
  2. don't forget to execute "ssh localhost" first.不要忘记先执行“ssh localhost”。 Believe or not!信不信由你! No ssh would throw an error message on Java heap space as well.没有 ssh 也会在 Java 堆空间上抛出错误消息。

You need to make adjustments to mapreduce.{map|reduce}.java.opts and also to mapreduce.{map|reduce}.memory.mb .您需要对mapreduce.{map|reduce}.java.optsmapreduce.{map|reduce}.memory.mb

For example:例如:

  hadoop jar <jarName> <fqcn> \
      -Dmapreduce.map.memory.mb=4096 \
      -Dmapreduce.map.java.opts=-Xmx3686m

here is good resource with answer to this question 是很好的资源,可以回答这个问题

You can solve this problem by editting the file /etc/hadoop/hadoop-env.sh .您可以通过编辑文件/etc/hadoop/hadoop-env.sh来解决此问题。

Hadoop was giving /etc/hadoop config directory precedence over conf directory. Hadoop 给予 /etc/hadoop config 目录优先于 conf 目录。

I also met with the same situation.我也遇到了同样的情况。

We faced the same situation.我们面临同样的情况。

Modifying the hadoop-env.sh worked out for me.修改hadoop-env.sh对我hadoop-env.sh

EXPORT HADOOP_HEAPSIZE would be commented, uncomment that & provide the size of your choice. EXPORT HADOOP_HEAPSIZE将被注释,取消注释并提供您选择的大小。

By default HEAPSIZE assigned is 1000MB.默认情况下分配的HEAPSIZE为 1000MB。

I installed hadoop 1.0.4 from the binary tar and had the out of memory problem.我从二进制 tar 安装了 hadoop 1.0.4 并且遇到了内存不足的问题。 I tried Tudor's, Zach Garner's, Nishant Nagwani's and Andris Birkmanis's solutions but none of them worked for me.我尝试了 Tudor、Zach Garner、Nishant Nagwani 和 Andris Birkmanis 的解决方案,但没有一个对我有用。

Editing the bin/hadoop to ignore $HADOOP_CLIENT_OPTS worked for me:编辑 bin/hadoop 以忽略 $HADOOP_CLIENT_OPTS 对我有用:

...
elif [ "$COMMAND" = "jar" ] ; then
     CLASS=org.apache.hadoop.util.RunJar
    #Line changed this line to avoid out of memory error:
    #HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
    # changed to:
     HADOOP_OPTS="$HADOOP_OPTS "
...

I'm assuming that there is a better way to do this but I could not find it.我假设有更好的方法来做到这一点,但我找不到它。

The same exception with Ubuntu, Hadoop 1.1.1.与 Ubuntu、Hadoop 1.1.1 相同的例外。 The solution was simple - edit shell variable $HADOOP_CLIENT_OPTS set by some init script.解决方案很简单 - 编辑由某些初始化脚本设置的 shell 变量 $HADOOP_CLIENT_OPTS。 But it took long time to find it =(不过找了好久才找到=(

Run your job like the one below:像下面这样运行你的工作:

bin/hadoop jar hadoop-examples-*.jar grep -D mapred.child.java.opts=-Xmx1024M input output 'dfs[a-z.]+' 

The heap space, by default is set to 32MB or 64MB.堆空间,默认设置为 32MB 或 64MB。 You can increase the heap space in properties file as, Tudor pointed out, or you can change it for this particular job by setting this property for this particular job. Tudor 指出,您可以增加属性文件中的堆空间,或者您可以通过为此特定作业设置此属性来更改此特定作业的堆空间。

Make sure the mapreduce.child.java.opts have sufficient memory required to run mapred job.确保mapreduce.child.java.opts有足够的内存来运行 mapred 作业。 Also ensure that mapreduce.task.io.sort.mb should be less than mapreduce.child.java.opts .还要确保mapreduce.task.io.sort.mb应该小于mapreduce.child.java.opts

Example:例子:

 mapreduce.child.java.opts=Xmx2048m

 mapreduce.task.io.sort.mb=100

Otherwise you'll hit the OOM issue even the HADOOP_CLIENT_OPTS in hadoop-env.sh have enough memory if configured.否则,即使配置了 hadoop-env.sh 中的 HADOOP_CLIENT_OPTS 有足够的内存,您也会遇到 OOM 问题。

Configure the JVM heap size for your map and reduce processes.为您的映射和缩减进程配置 JVM 堆大小。 These sizes need to be less than the physical memory you configured in the previous section.这些大小需要小于您在上一节中配置的物理内存。 As a general rule, they should be 80% the size of the YARN physical memory settings.作为一般规则,它们应该是 YARN 物理内存设置大小的 80%。

Configure mapreduce.map.java.opts and mapreduce.reduce.java.opts to set the map and reduce heap sizes respectively, eg配置mapreduce.map.java.optsmapreduce.reduce.java.opts分别设置映射和减少堆大小,例如

<property>  
   <name>mapreduce.map.java.opts</name>  
   <value>-Xmx1638m</value>
</property>
<property>  
   <name>mapreduce.reduce.java.opts</name>  
   <value>-Xmx3278m</value>
</property>

通过运行以下命令导出变量对我有用:

. conf/hadoop-env.sh

On Ubuntu using DEB install (at least for Hadoop 1.2.1) there is a /etc/profile.d/hadoop-env.sh symlink created to /etc/hadoop/hadoop-env.sh which causes it to load every time you log in. In my experience this is not necessary as the /usr/bin/hadoop wrapper itself will eventually call it (through /usr/libexec/hadoop-config.sh ).在使用 DEB 安装的 Ubuntu 上(至少对于 Hadoop 1.2.1),有一个/etc/profile.d/hadoop-env.sh符号链接创建到/etc/hadoop/hadoop-env.sh导致它每次你加载登录。根据我的经验,这不是必需的,因为/usr/bin/hadoop包装器本身最终会调用它(通过/usr/libexec/hadoop-config.sh )。 On my system I've removed the symlink and I no longer get weird issues when changing the value for -Xmx in HADOOP_CLIENT_OPTIONS (because every time that hadoop-env.sh script is run, the client options environment variable is updated, though keeping the old value)在我的系统上,我删除了符号链接,并且在更改HADOOP_CLIENT_OPTIONS -Xmx的值时不再遇到奇怪的问题(因为每次运行hadoop-env.sh脚本时,客户端选项环境变量都会更新,但保留旧值)

I ended up with a very similar issue last week.上周我遇到了一个非常相似的问题。 My input file that I was using had a big ass line in it which I could not view.我正在使用的输入文件中有一条我无法查看的大屁股线。 That line was almost 95% of my file size(95% of 1gb! imagine that!).那一行几乎是我文件大小的 95%(1GB 的 95%!想象一下!)。 I would suggest you take a look at your input files first.我建议你先看看你的输入文件。 You might be having a malformed input file that you want to look into.您可能有想要查看的格式错误的输入文件。 Try increasing heap space after you check the input file.检查输入文件后尝试增加堆空间。

If you are using Hadoop on Amazon EMR, a configuration can be added to increase the heap size:如果您在 Amazon EMR 上使用 Hadoop,可以添加配置以增加堆大小:

[
  {
    "Classification": "hadoop-env",
    "Properties": {},
    "Configurations": [
      {
        "Classification": "export",
        "Properties": {
          "HADOOP_HEAPSIZE": "2048"
        },
        "Configurations": []
      }
    ]
  }
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM