I'm wondering where the memory is used in the following job:
-Xmx2G
Streaming API:
/bin/cat
wc
Input File is a 350MByte file containg a single line full of a
's.
This is a simplified version of the real problem we've encountered.
Reading the file from the HDFS and constructing a Text
-Object should not amount to more than 700MB Heap - assuming that Text
does also use 16-Bit per Character - I'm not sure about that but I could imagine that Text
only uses 8-Bit.
So there is these (worst-case) 700MB Line. The Line should fit at least 2x in the Heap but I'm getting always out of memory errors.
Is this a possible Bug in Hadoop (eg unaccary copies) or do I just don't understand some required memory intensive steps?
Would be really thankful for any further hints.
The memory given to each child JVM running a task can be changed by setting the mapred.child.java.opts
property. The default setting is -Xmx200m , which gives each task 200 MB of memory.
When you are saying -
Input File is a 350MByte file containg a single line full of a's.
I'm assuming you file has a single line of all a's with a single endline delimiter.
If that is taken up as a value in the map(key, value) function, I think, you might have memory issues, since, you task have can use only 200MB and you have a record in memory which is of 350MB.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.