简体繁体中英

Hadoop Streaming Memory Usage

原文 2013-07-31 16:02:29 3 1 java/ memory/ hadoop/ mapreduce

I'm wondering where the memory is used in the following job:

Hadoop Mapper/Reducer Heap Size: -Xmx2G
Streaming API:
- Mapper: /bin/cat
- Reducer: wc
Input File is a 350MByte file containg a single line full of a 's.

This is a simplified version of the real problem we've encountered.

Reading the file from the HDFS and constructing a Text -Object should not amount to more than 700MB Heap - assuming that Text does also use 16-Bit per Character - I'm not sure about that but I could imagine that Text only uses 8-Bit.

So there is these (worst-case) 700MB Line. The Line should fit at least 2x in the Heap but I'm getting always out of memory errors.

Is this a possible Bug in Hadoop (eg unaccary copies) or do I just don't understand some required memory intensive steps?

Would be really thankful for any further hints.

1 answers

The memory given to each child JVM running a task can be changed by setting the mapred.child.java.opts property. The default setting is -Xmx200m , which gives each task 200 MB of memory.

When you are saying -

Input File is a 350MByte file containg a single line full of a's.

I'm assuming you file has a single line of all a's with a single endline delimiter.

If that is taken up as a value in the map(key, value) function, I think, you might have memory issues, since, you task have can use only 200MB and you have a record in memory which is of 350MB.

Hadoop memory usage: reduce container is running beyond physical memory limits

Streaming or custom Jar in Hadoop

Practical usage of Hadoop in project

Hadoop MapReduce DistributedCache usage

Where is the hadoop task's hprof data stored when using oozie workflow - measuring memory usage

"Size in Memory" under storage tab of spark UI showing increase in RAM usage over time for spark streaming

High memory usage when uploading a multipart file to Amazon S3 via streaming?

Hadoop HDFS java client usage

Master node resource usage in Hadoop

out of Memory Error in Hadoop

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Hadoop memory usage: reduce container is running beyond physical memory limits Streaming or custom Jar in Hadoop Practical usage of Hadoop in project Hadoop MapReduce DistributedCache usage Where is the hadoop task's hprof data stored when using oozie workflow - measuring memory usage "Size in Memory" under storage tab of spark UI showing increase in RAM usage over time for spark streaming High memory usage when uploading a multipart file to Amazon S3 via streaming? Hadoop HDFS java client usage Master node resource usage in Hadoop out of Memory Error in Hadoop

Related Tags

Hadoop Streaming Memory Usage

Question

1 answers

solution1 0 2013-07-31 18:22:25

solution1
0 2013-07-31 18:22:25