简体   繁体   中英

Apache Pig: java.lang.OutOfMemoryError: Java heap space

So I am trying to do a join on two pig relation.

RELATION1 = LOAD '$path' USING AvroStorage();
RELATION2 = LOAD '$path' USING AvroStorage();
RELATION3 = JOIN RELATION1 BY field, JOIN RELATION2 BY field;
STORE RELATION3 INTO '$PATH' USING AvroStorage();

But I am getting the following error:

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
Caused by: java.lang.reflect.InvocationTargetException
Caused by: java.lang.OutOfMemoryError: Java heap space

Seems like it's complaining there's not enough heap space. In my case, relation1 is relatively large eg ~1000GB, relation2 is small. Simply loading relation1 in pig script and do a filter would work. Can someone suggests how I can get around this problem? Thanks!

Since you mention that one of your relations is much smaller than the other, you might want to optimize your Pig scripts. Specifically, if one of your relations is smaller than the other, the smaller relation should go first so that the join is executed more efficiently ( read more here ):

RELATION3 = JOIN RELATION2 BY field, RELATION1 BY field;

If one of your relations is so small it can fit into memory, you can do a replicate join ( read more here ). Note that the order is reverse of the above:

RELATION3 = JOIN RELATION1 BY field, RELATION2 BY field USING 'replicated';

Additionally, you can use FOREACH statements before the join to select only the variables you need so that less data has to be moved around. Also, do any filtering before the join.

If you still get Java memory errors with these modifications, you can change mapreduce settings. For example, this other Stack Overflow answer recommends

SET mapreduce.map.memory.mb 4096;
SET mapreduce.reduce.memory.mb 6144;

(And there are many other questions/answers found by googling your errors with different recommended settings that you can try.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM