简体   繁体   中英

PIG aggregate function - OutOfMemory: Java Heap Space

The following fails:

data = FOREACH rawData GENERATE (int) col;
aggregate = FOREACH data GENERATE MIN(col);

Is there somehow I can get it the above to work?

I tried this:

data = FOREACH rawData GENERATE 1 dummy, (int) col;
grouped = GROUP data BY dummy;
aggregate = FOREACH grouped GENERATE MIN(data.col)

Now I get a: java.lang.Exception: java.lang.OutOfMemoryError: Java heap space

There are literally 11 rows of integers (1..11)...so not sure why I'm getting an outOfMemoryError?

I'm running the script from the command line pig -f myscript.pig

New to PIG so if I need to set something pls let me know...

Your intuition was correct in that you need to group the data first, before using MIN . You can use GROUP ALL for this purpose:

data = FOREACH rawData GENERATE (int) col;
grouped = GROUP data ALL;
aggregate = FOREACH grouped GENERATE MIN($1);

After a GROUP operation, $0 contains the group, $1 the first column, $2 the second column, and so on.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM