简体   繁体   中英

Getting error “java.lang.OutOfMemoryError: Java heap space” while running simple mapreduce job

I have been trying to run a simple Mapreduce job for wordcount in RHEL 6 but am consistently getting this error. Please help.

13/01/13 19:59:01 INFO mapred.MapTask: io.sort.mb = 100
13/01/13 19:59:01 WARN mapred.LocalJobRunner: job_local_0001
java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
13/01/13 19:59:02 INFO mapred.JobClient:  map 0% reduce 0%
13/01/13 19:59:02 INFO mapred.JobClient: Job complete: job_local_0001
13/01/13 19:59:02 INFO mapred.JobClient: Counters: 0

You probably need to increase some JVM settings for max heap and max perm space.

I'd recommend running Visual VM when your Hadoop job is running so you can get some visibility into what's going on.

Are you running multiple servers? Maybe you're asking a single server to do too much.

You can use jstat -gcutil to monitor memory usage of your JVMs. This will show you how fast is the Heap usage growing.

Further, you can also enable the GC logging, this is lightweight and will show you the same for each JVM that you instantiate:

-XX:+UnlockDiagnosticVMOptions -XX:+LogVMOutput -XX:LogFile=jvm.log -XX:+HeapDumpOnOutOfMemoryError -Xloggc:gc.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -showversion

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM